From: Ingo Molnar <mingo@elte.hu>
To: William Lee Irwin III <wli@holomorphy.com>
Cc: Peter Williams <pwil3058@bigpond.net.au>,
linux-kernel@vger.kernel.org,
Linus Torvalds <torvalds@linux-foundation.org>,
Andrew Morton <akpm@linux-foundation.org>,
Con Kolivas <kernel@kolivas.org>, Nick Piggin <npiggin@suse.de>,
Mike Galbraith <efault@gmx.de>,
Arjan van de Ven <arjan@infradead.org>,
Thomas Gleixner <tglx@linutronix.de>,
caglar@pardus.org.tr, Willy Tarreau <w@1wt.eu>,
Gene Heskett <gene.heskett@gmail.com>
Subject: Re: [patch] CFS scheduler, v3
Date: Sat, 21 Apr 2007 10:57:29 +0200 [thread overview]
Message-ID: <20070421085729.GD29800@elte.hu> (raw)
In-Reply-To: <20070421083317.GN31925@holomorphy.com>
* William Lee Irwin III <wli@holomorphy.com> wrote:
> I suppose this is a special case of the dreaded priority inversion.
> What of, say, nice 19 tasks holding fs semaphores and/or mutexes that
> nice -19 tasks are waiting to acquire? Perhaps rt_mutex should be the
> default mutex implementation.
while i agree that it could be an issue, lock inversion is nothing
really new, so i'd not go _that_ drastic to convert all mutexes to
rtmutexes. (i've taken my -rt/PREEMPT_RT hat off)
For example reiser3 based systems get pretty laggy on significant
reniced load (even with the vanilla scheduler) if CONFIG_PREEMPT_BKL is
enabled: reiser3 holds the BKL for extended periods of time so a "make
-j50" workload can starve it significantly and the tty layer's BKL use
makes any sort of keyboard (even over ssh) input laggy.
Other locks though are not held this frequently and the mutex
implementation is pretty fair for waiters anyway. (the semaphore
implementation is not nearly as much fair, and the Big Kernel Semaphore
is still struct semaphore based) So i'd really wait for specific
workloads to trigger problems, and _maybe_ convert certain mutexes to
rtmutexes, on an as-needed basis.
> > In any case, it is clear that rq->raw_cpu_load should be used instead of
> > rq->nr_running, when calculating the fair clock, but i begin to like the
> > nice_offset solution too in addition of this: it's effective in practice
> > and starvation-free in theory, and most importantly, it's very simple.
> > We could even make the nice offset granularity tunable, just in case
> > anyone wants to weaken (or strengthen) the effectivity of nice levels.
> > What do you think, can you see any obvious (or less obvious)
> > showstoppers with this approach?
>
> ->nice_offset's semantics are not meaningful to the end user,
> regardless of whether it's effective. [...]
yeah, agreed. That's one reason why i didnt make it tunable, it's pretty
meaningless to the user.
> [...] If there is something to be tuned, it should be relative shares
> of CPU bandwidth (load_weight) corresponding to each nice level or
> something else directly observable. The implementation could be
> ->nice_offset, if it suffices.
>
> Suppose a table of nice weights like the following is tuned via
> /proc/:
>
> -20 21 0 1
> -1 2 19 0.0476
> Essentially 1/(n+1) when n >= 0 and 1-n when n < 0.
ok, thanks for thinking about it. I have changed the nice weight in
CVSv5-to-be so that it defaults to something pretty close to your
suggestion: the ratio between a nice 0 loop and a nice 19 loop is now
set to about 2%. (This something that users requested for some time, the
default ~5% is a tad high when running reniced SETI jobs, etc.)
the actual percentage scales almost directly with the nice offset
granularity value, but if this should be exposed to users at all, i
agree that it would be better to directly expose this as some sort of
'ratio between nice 0 and nice 19 tasks', right? Or some other, more
finegrained metric. Percentile is too coarse i think, and using 0.1%
units isnt intuitive enough i think. The sysctl handler would then
transform that 'human readable' sysctl value into the appropriate
internal nice-offset-granularity value (or whatever mechanism the
implementation ends up using).
I'd not do this as a per-nice-level thing but as a single value that
rescales the whole nice level range at once. That's alot less easy to
misconfigure and we've got enough nice levels for users to pick from
almost arbitrarily, as long as they have the ability to influence the
max.
does this sound mostly OK to you?
Ingo
next prev parent reply other threads:[~2007-04-21 8:58 UTC|newest]
Thread overview: 39+ messages / expand[flat|nested] mbox.gz Atom feed top
2007-04-18 17:50 [patch] CFS scheduler, v3 Ingo Molnar
2007-04-18 21:26 ` William Lee Irwin III
2007-04-18 21:33 ` Ingo Molnar
2007-04-20 19:24 ` Christoph Lameter
2007-04-20 19:26 ` Siddha, Suresh B
2007-04-20 19:29 ` William Lee Irwin III
2007-04-20 19:33 ` Christoph Lameter
2007-04-20 19:38 ` William Lee Irwin III
2007-04-20 19:44 ` Christoph Lameter
2007-04-20 20:03 ` William Lee Irwin III
2007-04-20 20:11 ` Siddha, Suresh B
2007-04-24 17:39 ` Christoph Lameter
2007-04-24 17:42 ` Siddha, Suresh B
2007-04-24 17:47 ` Christoph Lameter
2007-04-24 17:50 ` Siddha, Suresh B
2007-04-24 17:55 ` Christoph Lameter
2007-04-24 18:06 ` Siddha, Suresh B
2007-04-20 0:10 ` Peter Williams
2007-04-20 4:48 ` Willy Tarreau
2007-04-20 6:02 ` Peter Williams
2007-04-20 6:21 ` Peter Williams
2007-04-20 7:26 ` Willy Tarreau
2007-04-20 6:46 ` Ingo Molnar
2007-04-20 7:32 ` Peter Williams
2007-04-20 12:28 ` Peter Williams
2007-04-21 8:07 ` Peter Williams
2007-04-20 13:15 ` William Lee Irwin III
2007-04-21 0:23 ` Peter Williams
2007-04-21 5:07 ` William Lee Irwin III
2007-04-21 5:38 ` Peter Williams
2007-04-21 7:32 ` Peter Williams
2007-04-21 7:54 ` Ingo Molnar
2007-04-21 8:33 ` William Lee Irwin III
2007-04-21 8:57 ` Ingo Molnar [this message]
2007-04-21 16:23 ` William Lee Irwin III
2007-04-21 10:37 ` Peter Williams
2007-04-21 12:21 ` Peter Williams
2007-04-20 14:21 ` Peter Williams
2007-04-20 14:33 ` Ingo Molnar
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20070421085729.GD29800@elte.hu \
--to=mingo@elte.hu \
--cc=akpm@linux-foundation.org \
--cc=arjan@infradead.org \
--cc=caglar@pardus.org.tr \
--cc=efault@gmx.de \
--cc=gene.heskett@gmail.com \
--cc=kernel@kolivas.org \
--cc=linux-kernel@vger.kernel.org \
--cc=npiggin@suse.de \
--cc=pwil3058@bigpond.net.au \
--cc=tglx@linutronix.de \
--cc=torvalds@linux-foundation.org \
--cc=w@1wt.eu \
--cc=wli@holomorphy.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox