* [PATCH] NFSD: fix use of setsockopt
@ 2008-06-19 13:33 Olga Kornievskaia
2008-06-25 0:50 ` Dean Hildebrand
2008-06-25 19:37 ` J. Bruce Fields
0 siblings, 2 replies; 12+ messages in thread
From: Olga Kornievskaia @ 2008-06-19 13:33 UTC (permalink / raw)
To: linux-nfs
[-- Attachment #1: Type: text/plain, Size: 562 bytes --]
The following patch fixes NFS server's use of setsockopt. For this
function to take an effect it first needs be called after socket
creation but before sock binding.
This patch also changes the size of the receive sock buffer to be same
as the send sock buffer. Both buffers are now a multiple of maxpayload
and number of nfsd threads.
This patch fixes the problem that receive window never opens beyond the
default TCP receive window size set by the 2nd parameter of the
net.ipv4.tcp_rmem sysctl.
Signed-off-by: Olga Kornievskaia <aglo@citi.umich.edu>
[-- Attachment #2: nfsd-fix-sockopt-7.patch --]
[-- Type: text/x-patch, Size: 1242 bytes --]
diff --git a/net/sunrpc/svcsock.c b/net/sunrpc/svcsock.c
index c75bffe..178b397 100644
--- a/net/sunrpc/svcsock.c
+++ b/net/sunrpc/svcsock.c
@@ -1191,7 +1191,7 @@ svc_tcp_recvfrom(struct svc_rqst *rqstp)
*/
svc_sock_setbufsize(svsk->sk_sock,
(serv->sv_nrthreads+3) * serv->sv_max_mesg,
- 3 * serv->sv_max_mesg);
+ (serv->sv_nrthreads+3) * serv->sv_max_mesg);
clear_bit(SK_DATA, &svsk->sk_flags);
@@ -1372,11 +1372,6 @@ svc_tcp_init(struct svc_sock *svsk)
* receive and respond to one request.
* svc_tcp_recvfrom will re-adjust if necessary
*/
- svc_sock_setbufsize(svsk->sk_sock,
- 3 * svsk->sk_server->sv_max_mesg,
- 3 * svsk->sk_server->sv_max_mesg);
-
- set_bit(SK_CHNGBUF, &svsk->sk_flags);
set_bit(SK_DATA, &svsk->sk_flags);
if (sk->sk_state != TCP_ESTABLISHED)
set_bit(SK_CLOSE, &svsk->sk_flags);
@@ -1761,6 +1756,8 @@ static int svc_create_socket(struct svc_serv *serv, int protocol,
if (type == SOCK_STREAM)
sock->sk->sk_reuse = 1; /* allow address reuse */
+ svc_sock_setbufsize(sock, (serv->sv_nrthreads+3) * serv->sv_max_mesg,
+ (serv->sv_nrthreads+3) * serv->sv_max_mesg);
error = kernel_bind(sock, sin, len);
if (error < 0)
goto bummer;
^ permalink raw reply related [flat|nested] 12+ messages in thread
* Re: [PATCH] NFSD: fix use of setsockopt
2008-06-19 13:33 [PATCH] NFSD: fix use of setsockopt Olga Kornievskaia
@ 2008-06-25 0:50 ` Dean Hildebrand
2008-06-25 19:40 ` J. Bruce Fields
2008-06-25 19:37 ` J. Bruce Fields
1 sibling, 1 reply; 12+ messages in thread
From: Dean Hildebrand @ 2008-06-25 0:50 UTC (permalink / raw)
To: Olga Kornievskaia; +Cc: linux-nfs
Hi Olga,
This makes sense, if NFSD is going to ignore global Linux TCP settings
and 'go it alone', then it shouldn't be constrained by them.
At least now we can increase the rcv buffer size by increasing the
number of NFSDs. I would still like to pursue my sysctl patch for the
rcv and snd buffer though since we have seen situations where too many
NFSDs can increase the randomness of requests to the underlying file
system, reducing the effectiveness readahead/write gathering.
Dean
Olga Kornievskaia wrote:
> The following patch fixes NFS server's use of setsockopt. For this
> function to take an effect it first needs be called after socket
> creation but before sock binding.
>
> This patch also changes the size of the receive sock buffer to be same
> as the send sock buffer. Both buffers are now a multiple of maxpayload
> and number of nfsd threads.
>
> This patch fixes the problem that receive window never opens beyond
> the default TCP receive window size set by the 2nd parameter of the
> net.ipv4.tcp_rmem sysctl.
>
> Signed-off-by: Olga Kornievskaia <aglo@citi.umich.edu>
Signed-off-by: Olga Kornievskaia <aglo@citi.umich.edu>
------------------------------------------------------------------------
diff --git a/net/sunrpc/svcsock.c b/net/sunrpc/svcsock.c
index c75bffe..178b397 100644
--- a/net/sunrpc/svcsock.c
+++ b/net/sunrpc/svcsock.c
@@ -1191,7 +1191,7 @@ svc_tcp_recvfrom(struct svc_rqst *rqstp)
*/
svc_sock_setbufsize(svsk->sk_sock,
(serv->sv_nrthreads+3) * serv->sv_max_mesg,
- 3 * serv->sv_max_mesg);
+ (serv->sv_nrthreads+3) * serv->sv_max_mesg);
clear_bit(SK_DATA, &svsk->sk_flags);
@@ -1372,11 +1372,6 @@ svc_tcp_init(struct svc_sock *svsk)
* receive and respond to one request.
* svc_tcp_recvfrom will re-adjust if necessary
*/
- svc_sock_setbufsize(svsk->sk_sock,
- 3 * svsk->sk_server->sv_max_mesg,
- 3 * svsk->sk_server->sv_max_mesg);
-
- set_bit(SK_CHNGBUF, &svsk->sk_flags);
set_bit(SK_DATA, &svsk->sk_flags);
if (sk->sk_state != TCP_ESTABLISHED)
set_bit(SK_CLOSE, &svsk->sk_flags);
@@ -1761,6 +1756,8 @@ static int svc_create_socket(struct svc_serv *serv, int protocol,
if (type == SOCK_STREAM)
sock->sk->sk_reuse = 1; /* allow address reuse */
+ svc_sock_setbufsize(sock, (serv->sv_nrthreads+3) * serv->sv_max_mesg,
+ (serv->sv_nrthreads+3) * serv->sv_max_mesg);
error = kernel_bind(sock, sin, len);
if (error < 0)
goto bummer;
^ permalink raw reply related [flat|nested] 12+ messages in thread
* Re: [PATCH] NFSD: fix use of setsockopt
2008-06-25 0:50 ` Dean Hildebrand
@ 2008-06-25 19:40 ` J. Bruce Fields
0 siblings, 0 replies; 12+ messages in thread
From: J. Bruce Fields @ 2008-06-25 19:40 UTC (permalink / raw)
To: Dean Hildebrand; +Cc: Olga Kornievskaia, linux-nfs
On Tue, Jun 24, 2008 at 05:50:31PM -0700, Dean Hildebrand wrote:
> Hi Olga,
>
> This makes sense, if NFSD is going to ignore global Linux TCP settings
> and 'go it alone', then it shouldn't be constrained by them.
>
> At least now we can increase the rcv buffer size by increasing the
> number of NFSDs. I would still like to pursue my sysctl patch for the
> rcv and snd buffer though since we have seen situations where too many
> NFSDs can increase the randomness of requests to the underlying file
> system, reducing the effectiveness readahead/write gathering.
Olga says she's also seeing some performance decrease with increasing
numbers of threads in our 10 gigabit testing, and I was wondering if
something like that could explain the change.
Anyone have ideas how we could measure how ordered our IO requests are?
(Or how much seeking the drives in our raid array are doing?)
--b.
>
> Dean
>
> Olga Kornievskaia wrote:
>> The following patch fixes NFS server's use of setsockopt. For this
>> function to take an effect it first needs be called after socket
>> creation but before sock binding.
>>
>> This patch also changes the size of the receive sock buffer to be same
>> as the send sock buffer. Both buffers are now a multiple of maxpayload
>> and number of nfsd threads.
>>
>> This patch fixes the problem that receive window never opens beyond
>> the default TCP receive window size set by the 2nd parameter of the
>> net.ipv4.tcp_rmem sysctl.
>>
>> Signed-off-by: Olga Kornievskaia <aglo@citi.umich.edu>
> Signed-off-by: Olga Kornievskaia <aglo@citi.umich.edu>
>
> ------------------------------------------------------------------------
>
> diff --git a/net/sunrpc/svcsock.c b/net/sunrpc/svcsock.c
> index c75bffe..178b397 100644
> --- a/net/sunrpc/svcsock.c
> +++ b/net/sunrpc/svcsock.c
> @@ -1191,7 +1191,7 @@ svc_tcp_recvfrom(struct svc_rqst *rqstp)
> */
> svc_sock_setbufsize(svsk->sk_sock,
> (serv->sv_nrthreads+3) * serv->sv_max_mesg,
> - 3 * serv->sv_max_mesg);
> + (serv->sv_nrthreads+3) * serv->sv_max_mesg);
>
> clear_bit(SK_DATA, &svsk->sk_flags);
>
> @@ -1372,11 +1372,6 @@ svc_tcp_init(struct svc_sock *svsk)
> * receive and respond to one request.
> * svc_tcp_recvfrom will re-adjust if necessary
> */
> - svc_sock_setbufsize(svsk->sk_sock,
> - 3 * svsk->sk_server->sv_max_mesg,
> - 3 * svsk->sk_server->sv_max_mesg);
> -
> - set_bit(SK_CHNGBUF, &svsk->sk_flags);
>
> set_bit(SK_DATA, &svsk->sk_flags);
> if (sk->sk_state != TCP_ESTABLISHED)
> set_bit(SK_CLOSE, &svsk->sk_flags);
> @@ -1761,6 +1756,8 @@ static int svc_create_socket(struct svc_serv *serv, int protocol,
>
> if (type == SOCK_STREAM)
> sock->sk->sk_reuse = 1; /* allow address reuse */
> + svc_sock_setbufsize(sock, (serv->sv_nrthreads+3) * serv->sv_max_mesg,
> + (serv->sv_nrthreads+3) * serv->sv_max_mesg);
> error = kernel_bind(sock, sin, len);
> if (error < 0)
> goto bummer;
>
> --
> To unsubscribe from this list: send the line "unsubscribe linux-nfs" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at http://vger.kernel.org/majordomo-info.html
^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: [PATCH] NFSD: fix use of setsockopt
2008-06-19 13:33 [PATCH] NFSD: fix use of setsockopt Olga Kornievskaia
2008-06-25 0:50 ` Dean Hildebrand
@ 2008-06-25 19:37 ` J. Bruce Fields
2008-06-25 20:44 ` Olga Kornievskaia
2008-06-26 17:59 ` Sm-notify Laurenz, Dirk
1 sibling, 2 replies; 12+ messages in thread
From: J. Bruce Fields @ 2008-06-25 19:37 UTC (permalink / raw)
To: Olga Kornievskaia; +Cc: linux-nfs, Neil Brown
On Thu, Jun 19, 2008 at 09:33:39AM -0400, Olga Kornievskaia wrote:
> The following patch fixes NFS server's use of setsockopt. For this
> function to take an effect it first needs be called after socket
> creation but before sock binding.
The tcp(7) man page actually claims that it's listen() and connect()
that matter (so the setsockopt is effective on (and only on) unconnected
sockets), so probably this could go after the bind and before the
listen? Not that it matters.
> This patch also changes the size of the receive sock buffer to be same
> as the send sock buffer. Both buffers are now a multiple of maxpayload
> and number of nfsd threads.
It would be nice if we could get some review from someone who remembers
what the justification for the smaller receive buffer size was (Neil?).
> This patch fixes the problem that receive window never opens beyond the
> default TCP receive window size set by the 2nd parameter of the
> net.ipv4.tcp_rmem sysctl.
Do you know what it does in the udp case?
--b.
>
> Signed-off-by: Olga Kornievskaia <aglo@citi.umich.edu>
> diff --git a/net/sunrpc/svcsock.c b/net/sunrpc/svcsock.c
> index c75bffe..178b397 100644
> --- a/net/sunrpc/svcsock.c
> +++ b/net/sunrpc/svcsock.c
> @@ -1191,7 +1191,7 @@ svc_tcp_recvfrom(struct svc_rqst *rqstp)
> */
> svc_sock_setbufsize(svsk->sk_sock,
> (serv->sv_nrthreads+3) * serv->sv_max_mesg,
> - 3 * serv->sv_max_mesg);
> + (serv->sv_nrthreads+3) * serv->sv_max_mesg);
>
> clear_bit(SK_DATA, &svsk->sk_flags);
>
> @@ -1372,11 +1372,6 @@ svc_tcp_init(struct svc_sock *svsk)
> * receive and respond to one request.
> * svc_tcp_recvfrom will re-adjust if necessary
> */
> - svc_sock_setbufsize(svsk->sk_sock,
> - 3 * svsk->sk_server->sv_max_mesg,
> - 3 * svsk->sk_server->sv_max_mesg);
> -
> - set_bit(SK_CHNGBUF, &svsk->sk_flags);
> set_bit(SK_DATA, &svsk->sk_flags);
> if (sk->sk_state != TCP_ESTABLISHED)
> set_bit(SK_CLOSE, &svsk->sk_flags);
> @@ -1761,6 +1756,8 @@ static int svc_create_socket(struct svc_serv *serv, int protocol,
>
> if (type == SOCK_STREAM)
> sock->sk->sk_reuse = 1; /* allow address reuse */
> + svc_sock_setbufsize(sock, (serv->sv_nrthreads+3) * serv->sv_max_mesg,
> + (serv->sv_nrthreads+3) * serv->sv_max_mesg);
> error = kernel_bind(sock, sin, len);
> if (error < 0)
> goto bummer;
^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: [PATCH] NFSD: fix use of setsockopt
2008-06-25 19:37 ` J. Bruce Fields
@ 2008-06-25 20:44 ` Olga Kornievskaia
2008-06-26 17:14 ` J. Bruce Fields
2008-06-26 17:59 ` Sm-notify Laurenz, Dirk
1 sibling, 1 reply; 12+ messages in thread
From: Olga Kornievskaia @ 2008-06-25 20:44 UTC (permalink / raw)
To: J. Bruce Fields; +Cc: linux-nfs, Neil Brown
J. Bruce Fields wrote:
> On Thu, Jun 19, 2008 at 09:33:39AM -0400, Olga Kornievskaia wrote:
>
>> The following patch fixes NFS server's use of setsockopt. For this
>> function to take an effect it first needs be called after socket
>> creation but before sock binding.
>>
>
> The tcp(7) man page actually claims that it's listen() and connect()
> that matter (so the setsockopt is effective on (and only on) unconnected
> sockets), so probably this could go after the bind and before the
> listen? Not that it matters.
>
>
>> This patch also changes the size of the receive sock buffer to be same
>> as the send sock buffer. Both buffers are now a multiple of maxpayload
>> and number of nfsd threads.
>>
>
> It would be nice if we could get some review from someone who remembers
> what the justification for the smaller receive buffer size was (Neil?).
>
>
>> This patch fixes the problem that receive window never opens beyond the
>> default TCP receive window size set by the 2nd parameter of the
>> net.ipv4.tcp_rmem sysctl.
>>
>
> Do you know what it does in the udp case?
>
Looking at the kernel code, when setsockopt() is called on a UDP socket
to set send/receive buffer for UPD the code will not do anything:
udp_setsockopt() and udp_lib_setsockopt() will return -ENOPROTOOPT.
However, we bypass the call to setsockopt() and instead set the buffer
sizes directly. From what I understand sk_sndbuf/sk_rcvbuf are not used
by the UDP code. We are setting the fields that are never used.
Then perhaps we can remove calls to svc_sock_setbufsize() from
svc_udp_init() and svc_udp_recvfrom()?
> --b.
>
>
>> Signed-off-by: Olga Kornievskaia <aglo@citi.umich.edu>
>>
>
>
>> diff --git a/net/sunrpc/svcsock.c b/net/sunrpc/svcsock.c
>> index c75bffe..178b397 100644
>> --- a/net/sunrpc/svcsock.c
>> +++ b/net/sunrpc/svcsock.c
>> @@ -1191,7 +1191,7 @@ svc_tcp_recvfrom(struct svc_rqst *rqstp)
>> */
>> svc_sock_setbufsize(svsk->sk_sock,
>> (serv->sv_nrthreads+3) * serv->sv_max_mesg,
>> - 3 * serv->sv_max_mesg);
>> + (serv->sv_nrthreads+3) * serv->sv_max_mesg);
>>
>> clear_bit(SK_DATA, &svsk->sk_flags);
>>
>> @@ -1372,11 +1372,6 @@ svc_tcp_init(struct svc_sock *svsk)
>> * receive and respond to one request.
>> * svc_tcp_recvfrom will re-adjust if necessary
>> */
>> - svc_sock_setbufsize(svsk->sk_sock,
>> - 3 * svsk->sk_server->sv_max_mesg,
>> - 3 * svsk->sk_server->sv_max_mesg);
>> -
>> - set_bit(SK_CHNGBUF, &svsk->sk_flags);
>> set_bit(SK_DATA, &svsk->sk_flags);
>> if (sk->sk_state != TCP_ESTABLISHED)
>> set_bit(SK_CLOSE, &svsk->sk_flags);
>> @@ -1761,6 +1756,8 @@ static int svc_create_socket(struct svc_serv *serv, int protocol,
>>
>> if (type == SOCK_STREAM)
>> sock->sk->sk_reuse = 1; /* allow address reuse */
>> + svc_sock_setbufsize(sock, (serv->sv_nrthreads+3) * serv->sv_max_mesg,
>> + (serv->sv_nrthreads+3) * serv->sv_max_mesg);
>> error = kernel_bind(sock, sin, len);
>> if (error < 0)
>> goto bummer;
>>
>
>
>
^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: [PATCH] NFSD: fix use of setsockopt
2008-06-25 20:44 ` Olga Kornievskaia
@ 2008-06-26 17:14 ` J. Bruce Fields
0 siblings, 0 replies; 12+ messages in thread
From: J. Bruce Fields @ 2008-06-26 17:14 UTC (permalink / raw)
To: Olga Kornievskaia; +Cc: linux-nfs, Neil Brown
On Wed, Jun 25, 2008 at 04:44:15PM -0400, Olga Kornievskaia wrote:
> Looking at the kernel code, when setsockopt() is called on a UDP socket
> to set send/receive buffer for UPD the code will not do anything:
> udp_setsockopt() and udp_lib_setsockopt() will return -ENOPROTOOPT.
> However, we bypass the call to setsockopt() and instead set the buffer
> sizes directly. From what I understand sk_sndbuf/sk_rcvbuf are not used
> by the UDP code. We are setting the fields that are never used.
>
> Then perhaps we can remove calls to svc_sock_setbufsize() from
> svc_udp_init() and svc_udp_recvfrom()?
Assuming you're correct about udp not using those fields (haven't
checked myself)--yes, that'd be great.
--b.
^ permalink raw reply [flat|nested] 12+ messages in thread
* Sm-notify
2008-06-25 19:37 ` J. Bruce Fields
2008-06-25 20:44 ` Olga Kornievskaia
@ 2008-06-26 17:59 ` Laurenz, Dirk
[not found] ` <FC3FA7C4E1CA2348B5579A68B24556E9DA4AF2946C-KofoAzQUpSBAuK1PVaBULA@public.gmane.org>
1 sibling, 1 reply; 12+ messages in thread
From: Laurenz, Dirk @ 2008-06-26 17:59 UTC (permalink / raw)
To: linux-nfs@vger.kernel.org; +Cc: Oeltze, Benjamin
Hi,
Does anybody know how sm-notify exactly works on a suse sles 9 system?
Greetings,
Dirk
^ permalink raw reply [flat|nested] 12+ messages in thread
end of thread, other threads:[~2008-06-28 10:33 UTC | newest]
Thread overview: 12+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2008-06-19 13:33 [PATCH] NFSD: fix use of setsockopt Olga Kornievskaia
2008-06-25 0:50 ` Dean Hildebrand
2008-06-25 19:40 ` J. Bruce Fields
2008-06-25 19:37 ` J. Bruce Fields
2008-06-25 20:44 ` Olga Kornievskaia
2008-06-26 17:14 ` J. Bruce Fields
2008-06-26 17:59 ` Sm-notify Laurenz, Dirk
[not found] ` <FC3FA7C4E1CA2348B5579A68B24556E9DA4AF2946C-KofoAzQUpSBAuK1PVaBULA@public.gmane.org>
2008-06-27 2:22 ` Sm-notify NeilBrown
[not found] ` <46260.192.168.1.70.1214533375.squirrel-eq65iwfR9nKIECXXMXunQA@public.gmane.org>
2008-06-27 6:58 ` Sm-notify Oeltze, Benjamin
2008-06-27 23:44 ` Sm-notify Neil Brown
[not found] ` <18533.31573.855657.391140-wvvUuzkyo1EYVZTmpyfIwg@public.gmane.org>
2008-06-28 7:38 ` Sm-notify Laurenz, Dirk
2008-06-28 10:33 ` Sm-notify Neil Brown
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.