From: Peter Staubach <staubach@redhat.com>
To: Greg Banks <gnb@sgi.com>
Cc: "J. Bruce Fields" <bfields@fieldses.org>,
Linux NFS ML <linux-nfs@vger.kernel.org>
Subject: Re: [patch 2/3] knfsd: avoid overloading the CPU scheduler with enormous load averages
Date: Tue, 13 Jan 2009 09:33:00 -0500 [thread overview]
Message-ID: <496CA61C.5050208@redhat.com> (raw)
In-Reply-To: <20090113102653.664553000@sgi.com>
Greg Banks wrote:
> Avoid overloading the CPU scheduler with enormous load averages
> when handling high call-rate NFS loads. When the knfsd bottom half
> is made aware of an incoming call by the socket layer, it tries to
> choose an nfsd thread and wake it up. As long as there are idle
> threads, one will be woken up.
>
> If there are lot of nfsd threads (a sensible configuration when
> the server is disk-bound or is running an HSM), there will be many
> more nfsd threads than CPUs to run them. Under a high call-rate
> low service-time workload, the result is that almost every nfsd is
> runnable, but only a handful are actually able to run. This situation
> causes two significant problems:
>
> 1. The CPU scheduler takes over 10% of each CPU, which is robbing
> the nfsd threads of valuable CPU time.
>
> 2. At a high enough load, the nfsd threads starve userspace threads
> of CPU time, to the point where daemons like portmap and rpc.mountd
> do not schedule for tens of seconds at a time. Clients attempting
> to mount an NFS filesystem timeout at the very first step (opening
> a TCP connection to portmap) because portmap cannot wake up from
> select() and call accept() in time.
>
> Disclaimer: these effects were observed on a SLES9 kernel, modern
> kernels' schedulers may behave more gracefully.
>
> The solution is simple: keep in each svc_pool a counter of the number
> of threads which have been woken but have not yet run, and do not wake
> any more if that count reaches an arbitrary small threshold.
>
> Testing was on a 4 CPU 4 NIC Altix using 4 IRIX clients, each with 16
> synthetic client threads simulating an rsync (i.e. recursive directory
> listing) workload reading from an i386 RH9 install image (161480
> regular files in 10841 directories) on the server. That tree is small
> enough to fill in the server's RAM so no disk traffic was involved.
> This setup gives a sustained call rate in excess of 60000 calls/sec
> before being CPU-bound on the server. The server was running 128 nfsds.
>
> Profiling showed schedule() taking 6.7% of every CPU, and __wake_up()
> taking 5.2%. This patch drops those contributions to 3.0% and 2.2%.
> Load average was over 120 before the patch, and 20.9 after.
>
> This patch is a forward-ported version of knfsd-avoid-nfsd-overload
> which has been shipping in the SGI "Enhanced NFS" product since 2006.
> It has been posted before:
>
> http://article.gmane.org/gmane.linux.nfs/10374
>
> Signed-off-by: Greg Banks <gnb@sgi.com>
> ---
Have you measured the impact of these changes for something
like SpecSFS?
Thanx...
ps
next prev parent reply other threads:[~2009-01-13 14:33 UTC|newest]
Thread overview: 25+ messages / expand[flat|nested] mbox.gz Atom feed top
2009-01-13 10:26 [patch 0/3] First tranche of SGI Enhanced NFS patches Greg Banks
2009-01-13 10:26 ` [patch 1/3] knfsd: remove the nfsd thread busy histogram Greg Banks
2009-01-13 16:41 ` Chuck Lever
2009-01-13 22:50 ` Greg Banks
[not found] ` <496D1ACC.7070106-cP1dWloDopni96+mSzHFpQC/G2K4zDHf@public.gmane.org>
2009-02-11 21:59 ` J. Bruce Fields
2009-01-13 10:26 ` [patch 2/3] knfsd: avoid overloading the CPU scheduler with enormous load averages Greg Banks
2009-01-13 14:33 ` Peter Staubach [this message]
2009-01-13 22:15 ` Greg Banks
[not found] ` <496D1294.1060407-cP1dWloDopni96+mSzHFpQC/G2K4zDHf@public.gmane.org>
2009-01-13 22:35 ` Peter Staubach
2009-01-13 23:04 ` Greg Banks
2009-02-11 23:10 ` J. Bruce Fields
2009-02-19 6:25 ` Greg Banks
2009-03-15 21:21 ` J. Bruce Fields
2009-03-16 3:10 ` Greg Banks
2009-01-13 10:26 ` [patch 3/3] knfsd: add file to export stats about nfsd pools Greg Banks
2009-02-12 17:11 ` J. Bruce Fields
2009-02-13 1:53 ` Kevin Constantine
2009-02-19 7:04 ` Greg Banks
2009-02-19 6:42 ` Greg Banks
2009-03-15 21:25 ` J. Bruce Fields
2009-03-16 3:21 ` Greg Banks
2009-03-16 13:37 ` J. Bruce Fields
2009-02-09 5:24 ` [patch 0/3] First tranche of SGI Enhanced NFS patches Greg Banks
2009-02-09 20:47 ` J. Bruce Fields
2009-02-09 23:26 ` Greg Banks
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=496CA61C.5050208@redhat.com \
--to=staubach@redhat.com \
--cc=bfields@fieldses.org \
--cc=gnb@sgi.com \
--cc=linux-nfs@vger.kernel.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.