Re: [patch 2/3] knfsd: avoid overloading the CPU scheduler with enormous load averages

public inbox for linux-nfs@vger.kernel.org
 help / color / mirror / Atom feed

From: Peter Staubach <staubach@redhat.com>
To: Greg Banks <gnb@sgi.com>
Cc: "J. Bruce Fields" <bfields@fieldses.org>,
	Linux NFS ML <linux-nfs@vger.kernel.org>
Subject: Re: [patch 2/3] knfsd: avoid overloading the CPU scheduler with enormous load averages
Date: Tue, 13 Jan 2009 09:33:00 -0500	[thread overview]
Message-ID: <496CA61C.5050208@redhat.com> (raw)
In-Reply-To: <20090113102653.664553000@sgi.com>

Greg Banks wrote:
> Avoid overloading the CPU scheduler with enormous load averages
> when handling high call-rate NFS loads.  When the knfsd bottom half
> is made aware of an incoming call by the socket layer, it tries to
> choose an nfsd thread and wake it up.  As long as there are idle
> threads, one will be woken up.
>
> If there are lot of nfsd threads (a sensible configuration when
> the server is disk-bound or is running an HSM), there will be many
> more nfsd threads than CPUs to run them.  Under a high call-rate
> low service-time workload, the result is that almost every nfsd is
> runnable, but only a handful are actually able to run.  This situation
> causes two significant problems:
>
> 1. The CPU scheduler takes over 10% of each CPU, which is robbing
>    the nfsd threads of valuable CPU time.
>
> 2. At a high enough load, the nfsd threads starve userspace threads
>    of CPU time, to the point where daemons like portmap and rpc.mountd
>    do not schedule for tens of seconds at a time.  Clients attempting
>    to mount an NFS filesystem timeout at the very first step (opening
>    a TCP connection to portmap) because portmap cannot wake up from
>    select() and call accept() in time.
>
> Disclaimer: these effects were observed on a SLES9 kernel, modern
> kernels' schedulers may behave more gracefully.
>
> The solution is simple: keep in each svc_pool a counter of the number
> of threads which have been woken but have not yet run, and do not wake
> any more if that count reaches an arbitrary small threshold.
>
> Testing was on a 4 CPU 4 NIC Altix using 4 IRIX clients, each with 16
> synthetic client threads simulating an rsync (i.e. recursive directory
> listing) workload reading from an i386 RH9 install image (161480
> regular files in 10841 directories) on the server.  That tree is small
> enough to fill in the server's RAM so no disk traffic was involved.
> This setup gives a sustained call rate in excess of 60000 calls/sec
> before being CPU-bound on the server.  The server was running 128 nfsds.
>
> Profiling showed schedule() taking 6.7% of every CPU, and __wake_up()
> taking 5.2%.  This patch drops those contributions to 3.0% and 2.2%.
> Load average was over 120 before the patch, and 20.9 after.
>
> This patch is a forward-ported version of knfsd-avoid-nfsd-overload
> which has been shipping in the SGI "Enhanced NFS" product since 2006.
> It has been posted before:
>
> http://article.gmane.org/gmane.linux.nfs/10374
>
> Signed-off-by: Greg Banks <gnb@sgi.com>
> ---

Have you measured the impact of these changes for something
like SpecSFS?

    Thanx...

       ps

next prev parent reply	other threads:[~2009-01-13 14:33 UTC|newest]

Thread overview: 25+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2009-01-13 10:26 [patch 0/3] First tranche of SGI Enhanced NFS patches Greg Banks
2009-01-13 10:26 ` [patch 1/3] knfsd: remove the nfsd thread busy histogram Greg Banks
2009-01-13 16:41   ` Chuck Lever
2009-01-13 22:50     ` Greg Banks
     [not found]       ` <496D1ACC.7070106-cP1dWloDopni96+mSzHFpQC/G2K4zDHf@public.gmane.org>
2009-02-11 21:59         ` J. Bruce Fields
2009-01-13 10:26 ` [patch 2/3] knfsd: avoid overloading the CPU scheduler with enormous load averages Greg Banks
2009-01-13 14:33   ` Peter Staubach [this message]
2009-01-13 22:15     ` Greg Banks
     [not found]       ` <496D1294.1060407-cP1dWloDopni96+mSzHFpQC/G2K4zDHf@public.gmane.org>
2009-01-13 22:35         ` Peter Staubach
2009-01-13 23:04           ` Greg Banks
2009-02-11 23:10   ` J. Bruce Fields
2009-02-19  6:25     ` Greg Banks
2009-03-15 21:21       ` J. Bruce Fields
2009-03-16  3:10         ` Greg Banks
2009-01-13 10:26 ` [patch 3/3] knfsd: add file to export stats about nfsd pools Greg Banks
2009-02-12 17:11   ` J. Bruce Fields
2009-02-13  1:53     ` Kevin Constantine
2009-02-19  7:04       ` Greg Banks
2009-02-19  6:42     ` Greg Banks
2009-03-15 21:25       ` J. Bruce Fields
2009-03-16  3:21         ` Greg Banks
2009-03-16 13:37           ` J. Bruce Fields
2009-02-09  5:24 ` [patch 0/3] First tranche of SGI Enhanced NFS patches Greg Banks
2009-02-09 20:47   ` J. Bruce Fields
2009-02-09 23:26     ` Greg Banks

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=496CA61C.5050208@redhat.com \
    --to=staubach@redhat.com \
    --cc=bfields@fieldses.org \
    --cc=gnb@sgi.com \
    --cc=linux-nfs@vger.kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox