From: Peter Staubach <staubach@redhat.com>
To: Greg Banks <gnb@sgi.com>
Cc: "J. Bruce Fields" <bfields@fieldses.org>,
Linux NFS ML <linux-nfs@vger.kernel.org>
Subject: Re: [patch 2/3] knfsd: avoid overloading the CPU scheduler with enormous load averages
Date: Tue, 13 Jan 2009 09:33:00 -0500 [thread overview]
Message-ID: <496CA61C.5050208@redhat.com> (raw)
In-Reply-To: <20090113102653.664553000@sgi.com>
Greg Banks wrote:
> Avoid overloading the CPU scheduler with enormous load averages
> when handling high call-rate NFS loads. When the knfsd bottom half
> is made aware of an incoming call by the socket layer, it tries to
> choose an nfsd thread and wake it up. As long as there are idle
> threads, one will be woken up.
>
> If there are lot of nfsd threads (a sensible configuration when
> the server is disk-bound or is running an HSM), there will be many
> more nfsd threads than CPUs to run them. Under a high call-rate
> low service-time workload, the result is that almost every nfsd is
> runnable, but only a handful are actually able to run. This situation
> causes two significant problems:
>
> 1. The CPU scheduler takes over 10% of each CPU, which is robbing
> the nfsd threads of valuable CPU time.
>
> 2. At a high enough load, the nfsd threads starve userspace threads
> of CPU time, to the point where daemons like portmap and rpc.mountd
> do not schedule for tens of seconds at a time. Clients attempting
> to mount an NFS filesystem timeout at the very first step (opening
> a TCP connection to portmap) because portmap cannot wake up from
> select() and call accept() in time.
>
> Disclaimer: these effects were observed on a SLES9 kernel, modern
> kernels' schedulers may behave more gracefully.
>
> The solution is simple: keep in each svc_pool a counter of the number
> of threads which have been woken but have not yet run, and do not wake
> any more if that count reaches an arbitrary small threshold.
>
> Testing was on a 4 CPU 4 NIC Altix using 4 IRIX clients, each with 16
> synthetic client threads simulating an rsync (i.e. recursive directory
> listing) workload reading from an i386 RH9 install image (161480
> regular files in 10841 directories) on the server. That tree is small
> enough to fill in the server's RAM so no disk traffic was involved.
> This setup gives a sustained call rate in excess of 60000 calls/sec
> before being CPU-bound on the server. The server was running 128 nfsds.
>
> Profiling showed schedule() taking 6.7% of every CPU, and __wake_up()
> taking 5.2%. This patch drops those contributions to 3.0% and 2.2%.
> Load average was over 120 before the patch, and 20.9 after.
>
> This patch is a forward-ported version of knfsd-avoid-nfsd-overload
> which has been shipping in the SGI "Enhanced NFS" product since 2006.
> It has been posted before:
>
> http://article.gmane.org/gmane.linux.nfs/10374
>
> Signed-off-by: Greg Banks <gnb@sgi.com>
> ---
Have you measured the impact of these changes for something
like SpecSFS?
Thanx...
ps
next prev parent reply other threads:[~2009-01-13 14:33 UTC|newest]
Thread overview: 25+ messages / expand[flat|nested] mbox.gz Atom feed top
2009-01-13 10:26 [patch 0/3] First tranche of SGI Enhanced NFS patches Greg Banks
2009-01-13 10:26 ` [patch 1/3] knfsd: remove the nfsd thread busy histogram Greg Banks
2009-01-13 16:41 ` Chuck Lever
2009-01-13 22:50 ` Greg Banks
[not found] ` <496D1ACC.7070106-cP1dWloDopni96+mSzHFpQC/G2K4zDHf@public.gmane.org>
2009-02-11 21:59 ` J. Bruce Fields
2009-01-13 10:26 ` [patch 2/3] knfsd: avoid overloading the CPU scheduler with enormous load averages Greg Banks
2009-01-13 14:33 ` Peter Staubach [this message]
2009-01-13 22:15 ` Greg Banks
[not found] ` <496D1294.1060407-cP1dWloDopni96+mSzHFpQC/G2K4zDHf@public.gmane.org>
2009-01-13 22:35 ` Peter Staubach
2009-01-13 23:04 ` Greg Banks
2009-02-11 23:10 ` J. Bruce Fields
2009-02-19 6:25 ` Greg Banks
2009-03-15 21:21 ` J. Bruce Fields
2009-03-16 3:10 ` Greg Banks
2009-01-13 10:26 ` [patch 3/3] knfsd: add file to export stats about nfsd pools Greg Banks
2009-02-12 17:11 ` J. Bruce Fields
2009-02-13 1:53 ` Kevin Constantine
2009-02-19 7:04 ` Greg Banks
2009-02-19 6:42 ` Greg Banks
2009-03-15 21:25 ` J. Bruce Fields
2009-03-16 3:21 ` Greg Banks
2009-03-16 13:37 ` J. Bruce Fields
2009-02-09 5:24 ` [patch 0/3] First tranche of SGI Enhanced NFS patches Greg Banks
2009-02-09 20:47 ` J. Bruce Fields
2009-02-09 23:26 ` Greg Banks
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=496CA61C.5050208@redhat.com \
--to=staubach@redhat.com \
--cc=bfields@fieldses.org \
--cc=gnb@sgi.com \
--cc=linux-nfs@vger.kernel.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox