All of lore.kernel.org
 help / color / mirror / Atom feed
* [RFC] [PATCH 1/1] tcp-autotuning-on-recv-window-fix
@ 2008-10-21 18:31 Olga Kornievskaia
  2008-10-22 19:46 ` J. Bruce Fields
  0 siblings, 1 reply; 9+ messages in thread
From: Olga Kornievskaia @ 2008-10-21 18:31 UTC (permalink / raw)
  To: linux-nfs

[-- Attachment #1: Type: text/plain, Size: 1 bytes --]



[-- Attachment #2: 0001-autotuning-recv-window-fix.patch --]
[-- Type: text/x-patch, Size: 3417 bytes --]

From: Olga Kornievskaia <aglo@citi.umich.edu>
Date: Tue, 21 Oct 2008 14:13:47 -0400
Subject: [RFC] [PATCH 1/1] tcp-autotuning-on-recv-window-fix

This patch allows for the NFSv4 server to make use of TCP autotuning behaviour
which was previously disabled by setting sk_userlocks variable. 

This patch sets the receive buffers to be big enough to receive the whole 
RPC request. This buffer size had to be set for the listening socket and not
the accept socket as it was previously done. 

This patch removes the code that readjust the receive/send buffer sizes for
the accepted socket. Previously this code was used to influence the TCP
window management behaviour which is no longer needed when autotuning is 
enabled. 

Signed-off-by: Olga Kornievskaia <aglo@citi.umich.edu>
---
 net/sunrpc/svcsock.c |   35 +++++++----------------------------
 1 files changed, 7 insertions(+), 28 deletions(-)

diff --git a/net/sunrpc/svcsock.c b/net/sunrpc/svcsock.c
index 3e65719..4bb535e 100644
--- a/net/sunrpc/svcsock.c
+++ b/net/sunrpc/svcsock.c
@@ -349,7 +349,6 @@ static void svc_sock_setbufsize(struct socket *sock, unsigned int snd,
 	lock_sock(sock->sk);
 	sock->sk->sk_sndbuf = snd * 2;
 	sock->sk->sk_rcvbuf = rcv * 2;
-	sock->sk->sk_userlocks |= SOCK_SNDBUF_LOCK|SOCK_RCVBUF_LOCK;
 	release_sock(sock->sk);
 #endif
 }
@@ -801,23 +800,6 @@ static int svc_tcp_recvfrom(struct svc_rqst *rqstp)
 		test_bit(XPT_CONN, &svsk->sk_xprt.xpt_flags),
 		test_bit(XPT_CLOSE, &svsk->sk_xprt.xpt_flags));
 
-	if (test_and_clear_bit(XPT_CHNGBUF, &svsk->sk_xprt.xpt_flags))
-		/* sndbuf needs to have room for one request
-		 * per thread, otherwise we can stall even when the
-		 * network isn't a bottleneck.
-		 *
-		 * We count all threads rather than threads in a
-		 * particular pool, which provides an upper bound
-		 * on the number of threads which will access the socket.
-		 *
-		 * rcvbuf just needs to be able to hold a few requests.
-		 * Normally they will be removed from the queue
-		 * as soon a a complete request arrives.
-		 */
-		svc_sock_setbufsize(svsk->sk_sock,
-				    (serv->sv_nrthreads+3) * serv->sv_max_mesg,
-				    3 * serv->sv_max_mesg);
-
 	clear_bit(XPT_DATA, &svsk->sk_xprt.xpt_flags);
 
 	/* Receive data. If we haven't got the record length yet, get
@@ -1065,15 +1047,6 @@ static void svc_tcp_init(struct svc_sock *svsk, struct svc_serv *serv)
 
 		tcp_sk(sk)->nonagle |= TCP_NAGLE_OFF;
 
-		/* initialise setting must have enough space to
-		 * receive and respond to one request.
-		 * svc_tcp_recvfrom will re-adjust if necessary
-		 */
-		svc_sock_setbufsize(svsk->sk_sock,
-				    3 * svsk->sk_xprt.xpt_server->sv_max_mesg,
-				    3 * svsk->sk_xprt.xpt_server->sv_max_mesg);
-
-		set_bit(XPT_CHNGBUF, &svsk->sk_xprt.xpt_flags);
 		set_bit(XPT_DATA, &svsk->sk_xprt.xpt_flags);
 		if (sk->sk_state != TCP_ESTABLISHED)
 			set_bit(XPT_CLOSE, &svsk->sk_xprt.xpt_flags);
@@ -1143,8 +1116,14 @@ static struct svc_sock *svc_setup_socket(struct svc_serv *serv,
 	/* Initialize the socket */
 	if (sock->type == SOCK_DGRAM)
 		svc_udp_init(svsk, serv);
-	else
+	else {
+		/* initialise setting must have enough space to
+		 * receive and respond to one request.
+		 */
+		svc_sock_setbufsize(svsk->sk_sock, 4 * serv->sv_max_mesg,
+					4 * serv->sv_max_mesg);
 		svc_tcp_init(svsk, serv);
+	}
 
 	dprintk("svc: svc_setup_socket created %p (inet %p)\n",
 				svsk, svsk->sk_sk);
-- 
1.5.0.2


^ permalink raw reply related	[flat|nested] 9+ messages in thread

* Re: [RFC] [PATCH 1/1] tcp-autotuning-on-recv-window-fix
  2008-10-21 18:31 [RFC] [PATCH 1/1] tcp-autotuning-on-recv-window-fix Olga Kornievskaia
@ 2008-10-22 19:46 ` J. Bruce Fields
  2008-10-22 21:30   ` Dean Hildebrand
                     ` (2 more replies)
  0 siblings, 3 replies; 9+ messages in thread
From: J. Bruce Fields @ 2008-10-22 19:46 UTC (permalink / raw)
  To: Olga Kornievskaia; +Cc: linux-nfs

On Tue, Oct 21, 2008 at 02:31:38PM -0400, Olga Kornievskaia wrote:
>
> From: Olga Kornievskaia <aglo@citi.umich.edu>
> Date: Tue, 21 Oct 2008 14:13:47 -0400
> Subject: [RFC] [PATCH 1/1] tcp-autotuning-on-recv-window-fix
> 
> This patch allows for the NFSv4 server to make use of TCP autotuning behaviour
> which was previously disabled by setting sk_userlocks variable. 
> 
> This patch sets the receive buffers to be big enough to receive the whole 
> RPC request. This buffer size had to be set for the listening socket and not
> the accept socket as it was previously done. 

The point there being that our previous buffer-size settings were made
too late to actually have an affect?

> This patch removes the code that readjust the receive/send buffer sizes for
> the accepted socket. Previously this code was used to influence the TCP
> window management behaviour which is no longer needed when autotuning is 
> enabled. 

Could we get a really brief summary of the performance improvement for a
high-speed network, to include in the commit message?

The one remaining worry I recall is that we assume the tcp autotuning
never decreases the size of the buffer below the size we initially
requested.  Apparently that assumption is true.  There's some worry
about whether that's true by design or merely true of the current
implementation.

That doesn't look like a big worry--I'm inclined to apply this patch as
is--but moving the sk_{rcv,snd}buf assignments to a simple function in
the networking code and documenting the requirements there might be a
nice thing to do (as a separate patch).

--b.

> 
> Signed-off-by: Olga Kornievskaia <aglo@citi.umich.edu>
> ---
>  net/sunrpc/svcsock.c |   35 +++++++----------------------------
>  1 files changed, 7 insertions(+), 28 deletions(-)
> 
> diff --git a/net/sunrpc/svcsock.c b/net/sunrpc/svcsock.c
> index 3e65719..4bb535e 100644
> --- a/net/sunrpc/svcsock.c
> +++ b/net/sunrpc/svcsock.c
> @@ -349,7 +349,6 @@ static void svc_sock_setbufsize(struct socket *sock, unsigned int snd,
>  	lock_sock(sock->sk);
>  	sock->sk->sk_sndbuf = snd * 2;
>  	sock->sk->sk_rcvbuf = rcv * 2;
> -	sock->sk->sk_userlocks |= SOCK_SNDBUF_LOCK|SOCK_RCVBUF_LOCK;
>  	release_sock(sock->sk);
>  #endif
>  }
> @@ -801,23 +800,6 @@ static int svc_tcp_recvfrom(struct svc_rqst *rqstp)
>  		test_bit(XPT_CONN, &svsk->sk_xprt.xpt_flags),
>  		test_bit(XPT_CLOSE, &svsk->sk_xprt.xpt_flags));
>  
> -	if (test_and_clear_bit(XPT_CHNGBUF, &svsk->sk_xprt.xpt_flags))
> -		/* sndbuf needs to have room for one request
> -		 * per thread, otherwise we can stall even when the
> -		 * network isn't a bottleneck.
> -		 *
> -		 * We count all threads rather than threads in a
> -		 * particular pool, which provides an upper bound
> -		 * on the number of threads which will access the socket.
> -		 *
> -		 * rcvbuf just needs to be able to hold a few requests.
> -		 * Normally they will be removed from the queue
> -		 * as soon a a complete request arrives.
> -		 */
> -		svc_sock_setbufsize(svsk->sk_sock,
> -				    (serv->sv_nrthreads+3) * serv->sv_max_mesg,
> -				    3 * serv->sv_max_mesg);
> -
>  	clear_bit(XPT_DATA, &svsk->sk_xprt.xpt_flags);
>  
>  	/* Receive data. If we haven't got the record length yet, get
> @@ -1065,15 +1047,6 @@ static void svc_tcp_init(struct svc_sock *svsk, struct svc_serv *serv)
>  
>  		tcp_sk(sk)->nonagle |= TCP_NAGLE_OFF;
>  
> -		/* initialise setting must have enough space to
> -		 * receive and respond to one request.
> -		 * svc_tcp_recvfrom will re-adjust if necessary
> -		 */
> -		svc_sock_setbufsize(svsk->sk_sock,
> -				    3 * svsk->sk_xprt.xpt_server->sv_max_mesg,
> -				    3 * svsk->sk_xprt.xpt_server->sv_max_mesg);
> -
> -		set_bit(XPT_CHNGBUF, &svsk->sk_xprt.xpt_flags);
>  		set_bit(XPT_DATA, &svsk->sk_xprt.xpt_flags);
>  		if (sk->sk_state != TCP_ESTABLISHED)
>  			set_bit(XPT_CLOSE, &svsk->sk_xprt.xpt_flags);
> @@ -1143,8 +1116,14 @@ static struct svc_sock *svc_setup_socket(struct svc_serv *serv,
>  	/* Initialize the socket */
>  	if (sock->type == SOCK_DGRAM)
>  		svc_udp_init(svsk, serv);
> -	else
> +	else {
> +		/* initialise setting must have enough space to

s/initialise/initial/

> +		 * receive and respond to one request.
> +		 */
> +		svc_sock_setbufsize(svsk->sk_sock, 4 * serv->sv_max_mesg,
> +					4 * serv->sv_max_mesg);
>  		svc_tcp_init(svsk, serv);
> +	}
>  
>  	dprintk("svc: svc_setup_socket created %p (inet %p)\n",
>  				svsk, svsk->sk_sk);
> -- 
> 1.5.0.2
> 


^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [RFC] [PATCH 1/1] tcp-autotuning-on-recv-window-fix
  2008-10-22 19:46 ` J. Bruce Fields
@ 2008-10-22 21:30   ` Dean Hildebrand
  2008-10-22 21:52     ` J. Bruce Fields
  2008-10-22 23:12   ` Jim Rees
  2008-10-23 15:17   ` Olga Kornievskaia
  2 siblings, 1 reply; 9+ messages in thread
From: Dean Hildebrand @ 2008-10-22 21:30 UTC (permalink / raw)
  To: J. Bruce Fields; +Cc: Olga Kornievskaia, linux-nfs



J. Bruce Fields wrote:
> On Tue, Oct 21, 2008 at 02:31:38PM -0400, Olga Kornievskaia wrote:
>   
>> From: Olga Kornievskaia <aglo@citi.umich.edu>
>> Date: Tue, 21 Oct 2008 14:13:47 -0400
>> Subject: [RFC] [PATCH 1/1] tcp-autotuning-on-recv-window-fix
>>
>> This patch allows for the NFSv4 server to make use of TCP autotuning behaviour
>> which was previously disabled by setting sk_userlocks variable. 
>>
>> This patch sets the receive buffers to be big enough to receive the whole 
>> RPC request. This buffer size had to be set for the listening socket and not
>> the accept socket as it was previously done. 
>>     
>
> The point there being that our previous buffer-size settings were made
> too late to actually have an affect?
>
>   
>> This patch removes the code that readjust the receive/send buffer sizes for
>> the accepted socket. Previously this code was used to influence the TCP
>> window management behaviour which is no longer needed when autotuning is 
>> enabled. 
>>     
>
> Could we get a really brief summary of the performance improvement for a
> high-speed network, to include in the commit message?
>
> The one remaining worry I recall is that we assume the tcp autotuning
> never decreases the size of the buffer below the size we initially
> requested.  Apparently that assumption is true.  There's some worry
> about whether that's true by design or merely true of the current
> implementation.
>   
If it does happen, I assume the fix there is to set the minimum tcp 
buffer settings big enough for a single request?

In fact, the big impact here I believe is that NFS will finally start 
using the linux tcp buffer settings (like every other tool).  Is there 
any way to
get some documentation out there that this is the case?  Maybe an 
addition to the website would be the right place for now?
Dean
> That doesn't look like a big worry--I'm inclined to apply this patch as
> is--but moving the sk_{rcv,snd}buf assignments to a simple function in
> the networking code and documenting the requirements there might be a
> nice thing to do (as a separate patch).
>
> --b.
>
>   
>> Signed-off-by: Olga Kornievskaia <aglo@citi.umich.edu>
>> ---
>>  net/sunrpc/svcsock.c |   35 +++++++----------------------------
>>  1 files changed, 7 insertions(+), 28 deletions(-)
>>
>> diff --git a/net/sunrpc/svcsock.c b/net/sunrpc/svcsock.c
>> index 3e65719..4bb535e 100644
>> --- a/net/sunrpc/svcsock.c
>> +++ b/net/sunrpc/svcsock.c
>> @@ -349,7 +349,6 @@ static void svc_sock_setbufsize(struct socket *sock, unsigned int snd,
>>  	lock_sock(sock->sk);
>>  	sock->sk->sk_sndbuf = snd * 2;
>>  	sock->sk->sk_rcvbuf = rcv * 2;
>> -	sock->sk->sk_userlocks |= SOCK_SNDBUF_LOCK|SOCK_RCVBUF_LOCK;
>>  	release_sock(sock->sk);
>>  #endif
>>  }
>> @@ -801,23 +800,6 @@ static int svc_tcp_recvfrom(struct svc_rqst *rqstp)
>>  		test_bit(XPT_CONN, &svsk->sk_xprt.xpt_flags),
>>  		test_bit(XPT_CLOSE, &svsk->sk_xprt.xpt_flags));
>>  
>> -	if (test_and_clear_bit(XPT_CHNGBUF, &svsk->sk_xprt.xpt_flags))
>> -		/* sndbuf needs to have room for one request
>> -		 * per thread, otherwise we can stall even when the
>> -		 * network isn't a bottleneck.
>> -		 *
>> -		 * We count all threads rather than threads in a
>> -		 * particular pool, which provides an upper bound
>> -		 * on the number of threads which will access the socket.
>> -		 *
>> -		 * rcvbuf just needs to be able to hold a few requests.
>> -		 * Normally they will be removed from the queue
>> -		 * as soon a a complete request arrives.
>> -		 */
>> -		svc_sock_setbufsize(svsk->sk_sock,
>> -				    (serv->sv_nrthreads+3) * serv->sv_max_mesg,
>> -				    3 * serv->sv_max_mesg);
>> -
>>  	clear_bit(XPT_DATA, &svsk->sk_xprt.xpt_flags);
>>  
>>  	/* Receive data. If we haven't got the record length yet, get
>> @@ -1065,15 +1047,6 @@ static void svc_tcp_init(struct svc_sock *svsk, struct svc_serv *serv)
>>  
>>  		tcp_sk(sk)->nonagle |= TCP_NAGLE_OFF;
>>  
>> -		/* initialise setting must have enough space to
>> -		 * receive and respond to one request.
>> -		 * svc_tcp_recvfrom will re-adjust if necessary
>> -		 */
>> -		svc_sock_setbufsize(svsk->sk_sock,
>> -				    3 * svsk->sk_xprt.xpt_server->sv_max_mesg,
>> -				    3 * svsk->sk_xprt.xpt_server->sv_max_mesg);
>> -
>> -		set_bit(XPT_CHNGBUF, &svsk->sk_xprt.xpt_flags);
>>  		set_bit(XPT_DATA, &svsk->sk_xprt.xpt_flags);
>>  		if (sk->sk_state != TCP_ESTABLISHED)
>>  			set_bit(XPT_CLOSE, &svsk->sk_xprt.xpt_flags);
>> @@ -1143,8 +1116,14 @@ static struct svc_sock *svc_setup_socket(struct svc_serv *serv,
>>  	/* Initialize the socket */
>>  	if (sock->type == SOCK_DGRAM)
>>  		svc_udp_init(svsk, serv);
>> -	else
>> +	else {
>> +		/* initialise setting must have enough space to
>>     
>
> s/initialise/initial/
>
>   
>> +		 * receive and respond to one request.
>> +		 */
>> +		svc_sock_setbufsize(svsk->sk_sock, 4 * serv->sv_max_mesg,
>> +					4 * serv->sv_max_mesg);
>>  		svc_tcp_init(svsk, serv);
>> +	}
>>  
>>  	dprintk("svc: svc_setup_socket created %p (inet %p)\n",
>>  				svsk, svsk->sk_sk);
>> -- 
>> 1.5.0.2
>>
>>     
>
> --
> To unsubscribe from this list: send the line "unsubscribe linux-nfs" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
>   

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [RFC] [PATCH 1/1] tcp-autotuning-on-recv-window-fix
  2008-10-22 21:30   ` Dean Hildebrand
@ 2008-10-22 21:52     ` J. Bruce Fields
  0 siblings, 0 replies; 9+ messages in thread
From: J. Bruce Fields @ 2008-10-22 21:52 UTC (permalink / raw)
  To: Dean Hildebrand; +Cc: Olga Kornievskaia, linux-nfs

On Wed, Oct 22, 2008 at 02:30:47PM -0700, Dean Hildebrand wrote:
>
>
> J. Bruce Fields wrote:
>> On Tue, Oct 21, 2008 at 02:31:38PM -0400, Olga Kornievskaia wrote:
>>   
>>> From: Olga Kornievskaia <aglo@citi.umich.edu>
>>> Date: Tue, 21 Oct 2008 14:13:47 -0400
>>> Subject: [RFC] [PATCH 1/1] tcp-autotuning-on-recv-window-fix
>>>
>>> This patch allows for the NFSv4 server to make use of TCP autotuning behaviour
>>> which was previously disabled by setting sk_userlocks variable. 
>>>
>>> This patch sets the receive buffers to be big enough to receive the 
>>> whole RPC request. This buffer size had to be set for the listening 
>>> socket and not
>>> the accept socket as it was previously done.     
>>
>> The point there being that our previous buffer-size settings were made
>> too late to actually have an affect?
>>
>>   
>>> This patch removes the code that readjust the receive/send buffer sizes for
>>> the accepted socket. Previously this code was used to influence the TCP
>>> window management behaviour which is no longer needed when autotuning 
>>> is enabled.     
>>
>> Could we get a really brief summary of the performance improvement for a
>> high-speed network, to include in the commit message?
>>
>> The one remaining worry I recall is that we assume the tcp autotuning
>> never decreases the size of the buffer below the size we initially
>> requested.  Apparently that assumption is true.  There's some worry
>> about whether that's true by design or merely true of the current
>> implementation.
>>   
> If it does happen, I assume the fix there is to set the minimum tcp  
> buffer settings big enough for a single request?

That might be a workaround.  The fix would be a kernel patch.  That
doesn't look likely.

> In fact, the big impact here I believe is that NFS will finally start  
> using the linux tcp buffer settings (like every other tool).

You're talking about the various sysctls?

Since they're global, I don't think they're really very useful as tools
to tune nfsd.

--b.

> Is there  
> any way to
> get some documentation out there that this is the case?  Maybe an  
> addition to the website would be the right place for now?
> Dean
>> That doesn't look like a big worry--I'm inclined to apply this patch as
>> is--but moving the sk_{rcv,snd}buf assignments to a simple function in
>> the networking code and documenting the requirements there might be a
>> nice thing to do (as a separate patch).
>>
>> --b.
>>
>>   
>>> Signed-off-by: Olga Kornievskaia <aglo@citi.umich.edu>
>>> ---
>>>  net/sunrpc/svcsock.c |   35 +++++++----------------------------
>>>  1 files changed, 7 insertions(+), 28 deletions(-)
>>>
>>> diff --git a/net/sunrpc/svcsock.c b/net/sunrpc/svcsock.c
>>> index 3e65719..4bb535e 100644
>>> --- a/net/sunrpc/svcsock.c
>>> +++ b/net/sunrpc/svcsock.c
>>> @@ -349,7 +349,6 @@ static void svc_sock_setbufsize(struct socket *sock, unsigned int snd,
>>>  	lock_sock(sock->sk);
>>>  	sock->sk->sk_sndbuf = snd * 2;
>>>  	sock->sk->sk_rcvbuf = rcv * 2;
>>> -	sock->sk->sk_userlocks |= SOCK_SNDBUF_LOCK|SOCK_RCVBUF_LOCK;
>>>  	release_sock(sock->sk);
>>>  #endif
>>>  }
>>> @@ -801,23 +800,6 @@ static int svc_tcp_recvfrom(struct svc_rqst *rqstp)
>>>  		test_bit(XPT_CONN, &svsk->sk_xprt.xpt_flags),
>>>  		test_bit(XPT_CLOSE, &svsk->sk_xprt.xpt_flags));
>>>  -	if (test_and_clear_bit(XPT_CHNGBUF, &svsk->sk_xprt.xpt_flags))
>>> -		/* sndbuf needs to have room for one request
>>> -		 * per thread, otherwise we can stall even when the
>>> -		 * network isn't a bottleneck.
>>> -		 *
>>> -		 * We count all threads rather than threads in a
>>> -		 * particular pool, which provides an upper bound
>>> -		 * on the number of threads which will access the socket.
>>> -		 *
>>> -		 * rcvbuf just needs to be able to hold a few requests.
>>> -		 * Normally they will be removed from the queue
>>> -		 * as soon a a complete request arrives.
>>> -		 */
>>> -		svc_sock_setbufsize(svsk->sk_sock,
>>> -				    (serv->sv_nrthreads+3) * serv->sv_max_mesg,
>>> -				    3 * serv->sv_max_mesg);
>>> -
>>>  	clear_bit(XPT_DATA, &svsk->sk_xprt.xpt_flags);
>>>   	/* Receive data. If we haven't got the record length yet, get
>>> @@ -1065,15 +1047,6 @@ static void svc_tcp_init(struct svc_sock *svsk, struct svc_serv *serv)
>>>   		tcp_sk(sk)->nonagle |= TCP_NAGLE_OFF;
>>>  -		/* initialise setting must have enough space to
>>> -		 * receive and respond to one request.
>>> -		 * svc_tcp_recvfrom will re-adjust if necessary
>>> -		 */
>>> -		svc_sock_setbufsize(svsk->sk_sock,
>>> -				    3 * svsk->sk_xprt.xpt_server->sv_max_mesg,
>>> -				    3 * svsk->sk_xprt.xpt_server->sv_max_mesg);
>>> -
>>> -		set_bit(XPT_CHNGBUF, &svsk->sk_xprt.xpt_flags);
>>>  		set_bit(XPT_DATA, &svsk->sk_xprt.xpt_flags);
>>>  		if (sk->sk_state != TCP_ESTABLISHED)
>>>  			set_bit(XPT_CLOSE, &svsk->sk_xprt.xpt_flags);
>>> @@ -1143,8 +1116,14 @@ static struct svc_sock *svc_setup_socket(struct svc_serv *serv,
>>>  	/* Initialize the socket */
>>>  	if (sock->type == SOCK_DGRAM)
>>>  		svc_udp_init(svsk, serv);
>>> -	else
>>> +	else {
>>> +		/* initialise setting must have enough space to
>>>     
>>
>> s/initialise/initial/
>>
>>   
>>> +		 * receive and respond to one request.
>>> +		 */
>>> +		svc_sock_setbufsize(svsk->sk_sock, 4 * serv->sv_max_mesg,
>>> +					4 * serv->sv_max_mesg);
>>>  		svc_tcp_init(svsk, serv);
>>> +	}
>>>   	dprintk("svc: svc_setup_socket created %p (inet %p)\n",
>>>  				svsk, svsk->sk_sk);
>>> -- 
>>> 1.5.0.2
>>>
>>>     
>>
>> --
>> To unsubscribe from this list: send the line "unsubscribe linux-nfs" in
>> the body of a message to majordomo@vger.kernel.org
>> More majordomo info at  http://vger.kernel.org/majordomo-info.html
>>   

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [RFC] [PATCH 1/1] tcp-autotuning-on-recv-window-fix
  2008-10-22 19:46 ` J. Bruce Fields
  2008-10-22 21:30   ` Dean Hildebrand
@ 2008-10-22 23:12   ` Jim Rees
  2008-10-23 15:17   ` Olga Kornievskaia
  2 siblings, 0 replies; 9+ messages in thread
From: Jim Rees @ 2008-10-22 23:12 UTC (permalink / raw)
  To: J. Bruce Fields; +Cc: Olga Kornievskaia, linux-nfs

J. Bruce Fields wrote:

  Could we get a really brief summary of the performance improvement for a
  high-speed network, to include in the commit message?

I don't have before and after numbers for this patch.  But an earlier
version boosted write performance from 432 to 15859 KBps on a transatlantic
link with 120ms rtt.  That's with 16 slot table entries.

  The one remaining worry I recall is that we assume the tcp autotuning
  never decreases the size of the buffer below the size we initially
  requested.  Apparently that assumption is true.  There's some worry
  about whether that's true by design or merely true of the current
  implementation.

That's true now but it's mostly an accident of the implementation.

  That doesn't look like a big worry--I'm inclined to apply this patch as
  is--but moving the sk_{rcv,snd}buf assignments to a simple function in
  the networking code and documenting the requirements there might be a
  nice thing to do (as a separate patch).

That's the right thing to do.  NFS shouldn't be reaching in to the sock
struct data fields.  I started to write such a patch but got bogged down.
Should it be a socket function, implemented by all the transports, or a tcp
only function?

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [RFC] [PATCH 1/1] tcp-autotuning-on-recv-window-fix
  2008-10-22 19:46 ` J. Bruce Fields
  2008-10-22 21:30   ` Dean Hildebrand
  2008-10-22 23:12   ` Jim Rees
@ 2008-10-23 15:17   ` Olga Kornievskaia
  2008-10-23 17:53     ` J. Bruce Fields
  2 siblings, 1 reply; 9+ messages in thread
From: Olga Kornievskaia @ 2008-10-23 15:17 UTC (permalink / raw)
  To: J. Bruce Fields; +Cc: linux-nfs



J. Bruce Fields wrote:
> On Tue, Oct 21, 2008 at 02:31:38PM -0400, Olga Kornievskaia wrote:
>   
>> From: Olga Kornievskaia <aglo@citi.umich.edu>
>> Date: Tue, 21 Oct 2008 14:13:47 -0400
>> Subject: [RFC] [PATCH 1/1] tcp-autotuning-on-recv-window-fix
>>
>> This patch allows for the NFSv4 server to make use of TCP autotuning behaviour
>> which was previously disabled by setting sk_userlocks variable. 
>>
>> This patch sets the receive buffers to be big enough to receive the whole 
>> RPC request. This buffer size had to be set for the listening socket and not
>> the accept socket as it was previously done. 
>>     
>
> The point there being that our previous buffer-size settings were made
> too late to actually have an affect?
>   
I would say they didn't have a desired effect. Actually modifying the 
receive/send buffer sizes on the accept socket does have an influence on 
the TCP behavior. I won't claim I understand what happens but from my 
observations I see that it messed up TCP autotuning behavior. It leads 
to an advertised window to be "clamped" at the value set for the 
listening socket.
>> This patch removes the code that readjust the receive/send buffer sizes for
>> the accepted socket. Previously this code was used to influence the TCP
>> window management behaviour which is no longer needed when autotuning is 
>> enabled. 
>>     
>
> Could we get a really brief summary of the performance improvement for a
> high-speed network, to include in the commit message?
>   
Here's a pick from some LAN performance #s: w/o 237479Mb/s => w/ 343669Mb/s.
> The one remaining worry I recall is that we assume the tcp autotuning
> never decreases the size of the buffer below the size we initially
> requested.  Apparently that assumption is true.  There's some worry
> about whether that's true by design or merely true of the current
> implementation.
>
> That doesn't look like a big worry--I'm inclined to apply this patch as
> is--but moving the sk_{rcv,snd}buf assignments to a simple function in
> the networking code and documenting the requirements there might be a
> nice thing to do (as a separate patch).
>   
Are you asking for svc_sock_setbufsize() function in svcsock.c to be 
moved to svc_xprt.c? Why? It really belongs in svcsock.c with the rest 
of the socket management code.
> --b.
>
>   
>> Signed-off-by: Olga Kornievskaia <aglo@citi.umich.edu>
>> ---
>>  net/sunrpc/svcsock.c |   35 +++++++----------------------------
>>  1 files changed, 7 insertions(+), 28 deletions(-)
>>
>> diff --git a/net/sunrpc/svcsock.c b/net/sunrpc/svcsock.c
>> index 3e65719..4bb535e 100644
>> --- a/net/sunrpc/svcsock.c
>> +++ b/net/sunrpc/svcsock.c
>> @@ -349,7 +349,6 @@ static void svc_sock_setbufsize(struct socket *sock, unsigned int snd,
>>  	lock_sock(sock->sk);
>>  	sock->sk->sk_sndbuf = snd * 2;
>>  	sock->sk->sk_rcvbuf = rcv * 2;
>> -	sock->sk->sk_userlocks |= SOCK_SNDBUF_LOCK|SOCK_RCVBUF_LOCK;
>>  	release_sock(sock->sk);
>>  #endif
>>  }
>> @@ -801,23 +800,6 @@ static int svc_tcp_recvfrom(struct svc_rqst *rqstp)
>>  		test_bit(XPT_CONN, &svsk->sk_xprt.xpt_flags),
>>  		test_bit(XPT_CLOSE, &svsk->sk_xprt.xpt_flags));
>>  
>> -	if (test_and_clear_bit(XPT_CHNGBUF, &svsk->sk_xprt.xpt_flags))
>> -		/* sndbuf needs to have room for one request
>> -		 * per thread, otherwise we can stall even when the
>> -		 * network isn't a bottleneck.
>> -		 *
>> -		 * We count all threads rather than threads in a
>> -		 * particular pool, which provides an upper bound
>> -		 * on the number of threads which will access the socket.
>> -		 *
>> -		 * rcvbuf just needs to be able to hold a few requests.
>> -		 * Normally they will be removed from the queue
>> -		 * as soon a a complete request arrives.
>> -		 */
>> -		svc_sock_setbufsize(svsk->sk_sock,
>> -				    (serv->sv_nrthreads+3) * serv->sv_max_mesg,
>> -				    3 * serv->sv_max_mesg);
>> -
>>  	clear_bit(XPT_DATA, &svsk->sk_xprt.xpt_flags);
>>  
>>  	/* Receive data. If we haven't got the record length yet, get
>> @@ -1065,15 +1047,6 @@ static void svc_tcp_init(struct svc_sock *svsk, struct svc_serv *serv)
>>  
>>  		tcp_sk(sk)->nonagle |= TCP_NAGLE_OFF;
>>  
>> -		/* initialise setting must have enough space to
>> -		 * receive and respond to one request.
>> -		 * svc_tcp_recvfrom will re-adjust if necessary
>> -		 */
>> -		svc_sock_setbufsize(svsk->sk_sock,
>> -				    3 * svsk->sk_xprt.xpt_server->sv_max_mesg,
>> -				    3 * svsk->sk_xprt.xpt_server->sv_max_mesg);
>> -
>> -		set_bit(XPT_CHNGBUF, &svsk->sk_xprt.xpt_flags);
>>  		set_bit(XPT_DATA, &svsk->sk_xprt.xpt_flags);
>>  		if (sk->sk_state != TCP_ESTABLISHED)
>>  			set_bit(XPT_CLOSE, &svsk->sk_xprt.xpt_flags);
>> @@ -1143,8 +1116,14 @@ static struct svc_sock *svc_setup_socket(struct svc_serv *serv,
>>  	/* Initialize the socket */
>>  	if (sock->type == SOCK_DGRAM)
>>  		svc_udp_init(svsk, serv);
>> -	else
>> +	else {
>> +		/* initialise setting must have enough space to
>>     
>
> s/initialise/initial/
>
>   
>> +		 * receive and respond to one request.
>> +		 */
>> +		svc_sock_setbufsize(svsk->sk_sock, 4 * serv->sv_max_mesg,
>> +					4 * serv->sv_max_mesg);
>>  		svc_tcp_init(svsk, serv);
>> +	}
>>  
>>  	dprintk("svc: svc_setup_socket created %p (inet %p)\n",
>>  				svsk, svsk->sk_sk);
>> -- 
>> 1.5.0.2
>>
>>     
>
> --
> To unsubscribe from this list: send the line "unsubscribe linux-nfs" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
>
>   

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [RFC] [PATCH 1/1] tcp-autotuning-on-recv-window-fix
  2008-10-23 15:17   ` Olga Kornievskaia
@ 2008-10-23 17:53     ` J. Bruce Fields
  2008-10-23 18:34       ` Olga Kornievskaia
  0 siblings, 1 reply; 9+ messages in thread
From: J. Bruce Fields @ 2008-10-23 17:53 UTC (permalink / raw)
  To: Olga Kornievskaia; +Cc: linux-nfs

On Thu, Oct 23, 2008 at 11:17:37AM -0400, Olga Kornievskaia wrote:
> J. Bruce Fields wrote:
>> Could we get a really brief summary of the performance improvement for a
>> high-speed network, to include in the commit message?
>>   
> Here's a pick from some LAN performance #s: w/o 237479Mb/s => w/ 343669Mb/s.

That's on a 1G network?  (With what ping time?)

> Are you asking for svc_sock_setbufsize() function in svcsock.c to be  
> moved to svc_xprt.c? Why? It really belongs in svcsock.c with the rest  
> of the socket management code.

This fragment:

>>>  	lock_sock(sock->sk);
>>>  	sock->sk->sk_sndbuf = snd * 2;
>>>  	sock->sk->sk_rcvbuf = rcv * 2;
>>> -	sock->sk->sk_userlocks |= SOCK_SNDBUF_LOCK|SOCK_RCVBUF_LOCK;
>>>  	release_sock(sock->sk);

should really be part of the core networking (not sunrpc) code.

That's been a todo for a while.  It doesn't necessarily have to be done
as a prerequisite to this patch.  But we should try to do it.

--b.

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [RFC] [PATCH 1/1] tcp-autotuning-on-recv-window-fix
  2008-10-23 17:53     ` J. Bruce Fields
@ 2008-10-23 18:34       ` Olga Kornievskaia
  2008-10-23 18:46         ` Olga Kornievskaia
  0 siblings, 1 reply; 9+ messages in thread
From: Olga Kornievskaia @ 2008-10-23 18:34 UTC (permalink / raw)
  To: J. Bruce Fields; +Cc: linux-nfs



J. Bruce Fields wrote:
> On Thu, Oct 23, 2008 at 11:17:37AM -0400, Olga Kornievskaia wrote:
>   
>> J. Bruce Fields wrote:
>>     
>>> Could we get a really brief summary of the performance improvement for a
>>> high-speed network, to include in the commit message?
>>>   
>>>       
>> Here's a pick from some LAN performance #s: w/o 237479Mb/s => w/ 343669Mb/s.
>>     
>
> That's on a 1G network?  (With what ping time?)
>   
over 10Gb/s. ping 0.1ms
>> Are you asking for svc_sock_setbufsize() function in svcsock.c to be  
>> moved to svc_xprt.c? Why? It really belongs in svcsock.c with the rest  
>> of the socket management code.
>>     
>
> This fragment:
>
>   
>>>>  	lock_sock(sock->sk);
>>>>  	sock->sk->sk_sndbuf = snd * 2;
>>>>  	sock->sk->sk_rcvbuf = rcv * 2;
>>>> -	sock->sk->sk_userlocks |= SOCK_SNDBUF_LOCK|SOCK_RCVBUF_LOCK;
>>>>  	release_sock(sock->sk);
>>>>         
>
> should really be part of the core networking (not sunrpc) code.
>
> That's been a todo for a while.  It doesn't necessarily have to be done
> as a prerequisite to this patch.  But we should try to do it.
>
> --b.
>
>   

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [RFC] [PATCH 1/1] tcp-autotuning-on-recv-window-fix
  2008-10-23 18:34       ` Olga Kornievskaia
@ 2008-10-23 18:46         ` Olga Kornievskaia
  0 siblings, 0 replies; 9+ messages in thread
From: Olga Kornievskaia @ 2008-10-23 18:46 UTC (permalink / raw)
  To: Olga Kornievskaia; +Cc: J. Bruce Fields, linux-nfs



Olga Kornievskaia wrote:
>
>
> J. Bruce Fields wrote:
>> On Thu, Oct 23, 2008 at 11:17:37AM -0400, Olga Kornievskaia wrote:
>>  
>>> J. Bruce Fields wrote:
>>>    
>>>> Could we get a really brief summary of the performance improvement 
>>>> for a
>>>> high-speed network, to include in the commit message?
>>>>         
>>> Here's a pick from some LAN performance #s: w/o 237479Mb/s => w/ 
>>> 343669Mb/s.
Sorry, wrong units: 237479KB/s and 343669KB/s.


^ permalink raw reply	[flat|nested] 9+ messages in thread

end of thread, other threads:[~2008-10-23 18:46 UTC | newest]

Thread overview: 9+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2008-10-21 18:31 [RFC] [PATCH 1/1] tcp-autotuning-on-recv-window-fix Olga Kornievskaia
2008-10-22 19:46 ` J. Bruce Fields
2008-10-22 21:30   ` Dean Hildebrand
2008-10-22 21:52     ` J. Bruce Fields
2008-10-22 23:12   ` Jim Rees
2008-10-23 15:17   ` Olga Kornievskaia
2008-10-23 17:53     ` J. Bruce Fields
2008-10-23 18:34       ` Olga Kornievskaia
2008-10-23 18:46         ` Olga Kornievskaia

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.