From: Andrew Morton <akpm@linux-foundation.org>
To: Jeff Moyer <jmoyer@redhat.com>
Cc: Jens Axboe <jens.axboe@oracle.com>,
linux-kernel@vger.kernel.org, "Rafael J. Wysocki" <rjw@sisk.pl>,
Olga Kornievskaia <aglo@citi.umich.edu>,
"J. Bruce Fields" <bfields@fieldses.org>,
Jim Rees <rees@umich.edu>,
linux-nfs@vger.kernel.org
Subject: Re: 2.6.30-rc deadline scheduler performance regression for iozone over NFS
Date: Tue, 12 May 2009 20:44:33 -0700 [thread overview]
Message-ID: <20090512204433.7eb69075.akpm@linux-foundation.org> (raw)
In-Reply-To: <x494ovp4r51.fsf@segfault.boston.devel.redhat.com>
(obvious cc's added...)
It's an iozone performance regression.
On Tue, 12 May 2009 23:29:30 -0400 Jeff Moyer <jmoyer@redhat.com> wrote:
> Jens Axboe <jens.axboe@oracle.com> writes:
>
> > On Mon, May 11 2009, Jeff Moyer wrote:
> >> Jens Axboe <jens.axboe@oracle.com> writes:
> >>
> >> > On Fri, May 08 2009, Andrew Morton wrote:
> >> >> On Thu, 23 Apr 2009 10:01:58 -0400
> >> >> Jeff Moyer <jmoyer@redhat.com> wrote:
> >> >>
> >> >> > Hi,
> >> >> >
> >> >> > I've been working on CFQ improvements for interleaved I/Os between
> >> >> > processes, and noticed a regression in performance when using the
> >> >> > deadline I/O scheduler. The test uses a server configured with a cciss
> >> >> > array and 1Gb/s ethernet.
> >> >> >
> >> >> > The iozone command line was:
> >> >> > iozone -s 2000000 -r 64 -f /mnt/test/testfile -i 1 -w
> >> >> >
> >> >> > The numbers in the nfsd's row represent the number of nfsd "threads".
> >> >> > These numbers (in MB/s) represent the average of 5 runs.
> >> >> >
> >> >> > v2.6.29
> >> >> >
> >> >> > nfsd's | 1 | 2 | 4 | 8
> >> >> > --------+---------------+-------+------
> >> >> > deadline| 43207 | 67436 | 96289 | 107590
> >> >> >
> >> >> > 2.6.30-rc1
> >> >> >
> >> >> > nfsd's | 1 | 2 | 4 | 8
> >> >> > --------+---------------+-------+------
> >> >> > deadline| 43732 | 68059 | 76659 | 83231
> >> >> >
> >> >> > 2.6.30-rc3.block-for-linus
> >> >> >
> >> >> > nfsd's | 1 | 2 | 4 | 8
> >> >> > --------+---------------+-------+------
> >> >> > deadline| 46102 | 71151 | 83120 | 82330
> >> >> >
> >> >> >
> >> >> > Notice the drop for 4 and 8 threads. It may be worth noting that the
> >> >> > default number of NFSD threads is 8.
> >> >> >
> >> >>
> >> >> I guess we should ask Rafael to add this to the post-2.6.29 regression
> >> >> list.
> >> >
> >> > I agree. It'd be nice to bisect this one down, I'm guessing some mm
> >> > change has caused this writeout regression.
> >>
> >> It's not writeout, it's a read test.
> >
> > Doh sorry, I even ran these tests as well a few weeks back. So perhaps
> > some read-ahead change, I didn't look into it. FWIW, on a single SATA
> > drive here, it didn't show any difference.
>
> OK, I bisected this to the following commit. The mount is done using
> NFSv3, by the way.
>
> commit 47a14ef1af48c696b214ac168f056ddc79793d0e
> Author: Olga Kornievskaia <aglo@citi.umich.edu>
> Date: Tue Oct 21 14:13:47 2008 -0400
>
> svcrpc: take advantage of tcp autotuning
>
> Allow the NFSv4 server to make use of TCP autotuning behaviour, which
> was previously disabled by setting the sk_userlocks variable.
>
> Set the receive buffers to be big enough to receive the whole RPC
> request, and set this for the listening socket, not the accept socket.
>
> Remove the code that readjusts the receive/send buffer sizes for the
> accepted socket. Previously this code was used to influence the TCP
> window management behaviour, which is no longer needed when autotuning
> is enabled.
>
> This can improve IO bandwidth on networks with high bandwidth-delay
> products, where a large tcp window is required. It also simplifies
> performance tuning, since getting adequate tcp buffers previously
> required increasing the number of nfsd threads.
>
> Signed-off-by: Olga Kornievskaia <aglo@citi.umich.edu>
> Cc: Jim Rees <rees@umich.edu>
> Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>
>
> diff --git a/net/sunrpc/svcsock.c b/net/sunrpc/svcsock.c
> index 5763e64..7a2a90f 100644
> --- a/net/sunrpc/svcsock.c
> +++ b/net/sunrpc/svcsock.c
> @@ -345,7 +345,6 @@ static void svc_sock_setbufsize(struct socket *sock, unsigned int snd,
> lock_sock(sock->sk);
> sock->sk->sk_sndbuf = snd * 2;
> sock->sk->sk_rcvbuf = rcv * 2;
> - sock->sk->sk_userlocks |= SOCK_SNDBUF_LOCK|SOCK_RCVBUF_LOCK;
> release_sock(sock->sk);
> #endif
> }
> @@ -797,23 +796,6 @@ static int svc_tcp_recvfrom(struct svc_rqst *rqstp)
> test_bit(XPT_CONN, &svsk->sk_xprt.xpt_flags),
> test_bit(XPT_CLOSE, &svsk->sk_xprt.xpt_flags));
>
> - if (test_and_clear_bit(XPT_CHNGBUF, &svsk->sk_xprt.xpt_flags))
> - /* sndbuf needs to have room for one request
> - * per thread, otherwise we can stall even when the
> - * network isn't a bottleneck.
> - *
> - * We count all threads rather than threads in a
> - * particular pool, which provides an upper bound
> - * on the number of threads which will access the socket.
> - *
> - * rcvbuf just needs to be able to hold a few requests.
> - * Normally they will be removed from the queue
> - * as soon a a complete request arrives.
> - */
> - svc_sock_setbufsize(svsk->sk_sock,
> - (serv->sv_nrthreads+3) * serv->sv_max_mesg,
> - 3 * serv->sv_max_mesg);
> -
> clear_bit(XPT_DATA, &svsk->sk_xprt.xpt_flags);
>
> /* Receive data. If we haven't got the record length yet, get
> @@ -1061,15 +1043,6 @@ static void svc_tcp_init(struct svc_sock *svsk, struct svc_serv *serv)
>
> tcp_sk(sk)->nonagle |= TCP_NAGLE_OFF;
>
> - /* initialise setting must have enough space to
> - * receive and respond to one request.
> - * svc_tcp_recvfrom will re-adjust if necessary
> - */
> - svc_sock_setbufsize(svsk->sk_sock,
> - 3 * svsk->sk_xprt.xpt_server->sv_max_mesg,
> - 3 * svsk->sk_xprt.xpt_server->sv_max_mesg);
> -
> - set_bit(XPT_CHNGBUF, &svsk->sk_xprt.xpt_flags);
> set_bit(XPT_DATA, &svsk->sk_xprt.xpt_flags);
> if (sk->sk_state != TCP_ESTABLISHED)
> set_bit(XPT_CLOSE, &svsk->sk_xprt.xpt_flags);
> @@ -1140,8 +1113,14 @@ static struct svc_sock *svc_setup_socket(struct svc_serv *serv,
> /* Initialize the socket */
> if (sock->type == SOCK_DGRAM)
> svc_udp_init(svsk, serv);
> - else
> + else {
> + /* initialise setting must have enough space to
> + * receive and respond to one request.
> + */
> + svc_sock_setbufsize(svsk->sk_sock, 4 * serv->sv_max_mesg,
> + 4 * serv->sv_max_mesg);
> svc_tcp_init(svsk, serv);
> + }
>
> /*
> * We start one listener per sv_serv. We want AF_INET
next prev parent reply other threads:[~2009-05-13 3:50 UTC|newest]
Thread overview: 27+ messages / expand[flat|nested] mbox.gz Atom feed top
2009-04-23 14:01 2.6.30-rc deadline scheduler performance regression for iozone over NFS Jeff Moyer
2009-05-08 19:01 ` Andrew Morton
2009-05-11 8:14 ` Jens Axboe
2009-05-11 13:53 ` Jeff Moyer
2009-05-11 16:58 ` Jens Axboe
2009-05-13 3:29 ` Jeff Moyer
2009-05-13 3:44 ` Andrew Morton [this message]
2009-05-13 14:58 ` Jeff Moyer
2009-05-13 16:20 ` Olga Kornievskaia
2009-05-13 16:32 ` Andrew Morton
2009-05-13 18:16 ` Olga Kornievskaia
2009-05-13 19:06 ` Jeff Moyer
2009-05-13 18:25 ` Jim Rees
2009-05-13 19:45 ` Trond Myklebust
2009-05-13 19:29 ` Jeff Moyer
2009-05-13 23:45 ` Trond Myklebust
2009-05-14 13:34 ` Jeff Moyer
2009-05-14 14:33 ` Trond Myklebust
2009-05-14 14:38 ` Jeff Moyer
2009-05-14 15:00 ` Jeff Moyer
2009-05-17 19:10 ` Trond Myklebust
2009-05-17 19:12 ` Trond Myklebust
2009-05-18 14:15 ` Jeff Moyer
2009-05-22 23:45 ` J. Bruce Fields
2009-05-14 17:55 ` J. Bruce Fields
2009-05-14 18:26 ` Trond Myklebust
2009-05-15 21:37 ` J. Bruce Fields
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20090512204433.7eb69075.akpm@linux-foundation.org \
--to=akpm@linux-foundation.org \
--cc=aglo@citi.umich.edu \
--cc=bfields@fieldses.org \
--cc=jens.axboe@oracle.com \
--cc=jmoyer@redhat.com \
--cc=linux-kernel@vger.kernel.org \
--cc=linux-nfs@vger.kernel.org \
--cc=rees@umich.edu \
--cc=rjw@sisk.pl \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox