From: Chris Snook <csnook@redhat.com>
To: Tong Li <tong.n.li@intel.com>
Cc: "Bill Huey (hui)" <billh@gnuppy.monkey.org>,
Ingo Molnar <mingo@elte.hu>,
linux-kernel@vger.kernel.org
Subject: Re: [RFC] scheduler: improve SMP fairness in CFS
Date: Sat, 28 Jul 2007 22:40:45 -0400 [thread overview]
Message-ID: <46ABFE2D.1060505@redhat.com> (raw)
In-Reply-To: <Pine.LNX.4.64.0707281224100.22772@tongli.jf.intel.com>
Tong Li wrote:
> On Fri, 27 Jul 2007, Chris Snook wrote:
>
>> Bill Huey (hui) wrote:
>>> You have to consider the target for this kind of code. There are
>>> applications
>>> where you need something that falls within a constant error bound.
>>> According
>>> to the numbers, the current CFS rebalancing logic doesn't achieve
>>> that to
>>> any degree of rigor. So CFS is ok for SCHED_OTHER, but not for
>>> anything more
>>> strict than that.
>>
>> I've said from the beginning that I think that anyone who desperately
>> needs perfect fairness should be explicitly enforcing it with the aid
>> of realtime priorities. The problem is that configuring and tuning a
>> realtime application is a pain, and people want to be able to
>> approximate this behavior without doing a whole lot of dirty work
>> themselves. I believe that CFS can and should be enhanced to ensure
>> SMP-fairness over potentially short, user-configurable intervals, even
>> for SCHED_OTHER. I do not, however, believe that we should take it to
>> the extreme of wasting CPU cycles on migrations that will not improve
>> performance for *any* task, just to avoid letting some tasks get ahead
>> of others. We should be as fair as possible but no fairer. If we've
>> already made it as fair as possible, we should account for the margin
>> of error and correct for it the next time we rebalance. We should not
>> burn the surplus just to get rid of it.
>
> Proportional-share scheduling actually has one of its roots in real-time
> and having a p-fair scheduler is essential for real-time apps (soft
> real-time).
Sounds like another scheduler class might be in order. I find CFS to be
fair enough for most purposes. If the code that gives us near-perfect
fairness at the expense of efficiency only runs when tasks have been
given boosted priority by a privileged user, and only on the CPUs that
have such tasks queued on them, the run time overhead and code
complexity become much smaller concerns.
>>
>> On a non-NUMA box with single-socket, non-SMT processors, a constant
>> error bound is fine. Once we add SMT, go multi-core, go NUMA, and add
>> inter-chassis interconnects on top of that, we need to multiply this
>> error bound at each stage in the hierarchy, or else we'll end up
>> wasting CPU cycles on migrations that actually hurt the processes
>> they're supposed to be helping, and hurt everyone else even more. I
>> believe we should enforce an error bound that is proportional to
>> migration cost.
>>
>
> I think we are actually in agreement. When I say constant bound, it can
> certainly be a constant that's determined based on inputs from the
> memory hierarchy. The point is that it needs to be a constant
> independent of things like # of tasks.
Agreed.
>> But this patch is only relevant to SCHED_OTHER. The realtime
>> scheduler doesn't have a concept of fairness, just priorities. That
>> why each realtime priority level has its own separate runqueue.
>> Realtime schedulers are supposed to be dumb as a post, so they cannot
>> heuristically decide to do anything other than precisely what you
>> configured them to do, and so they don't get in the way when you're
>> context switching a million times a second.
>
> Are you referring to hard real-time? As I said, an infrastructure that
> enables p-fair scheduling, EDF, or things alike is the foundation for
> real-time. I designed DWRR, however, with a target of non-RT apps,
> although I was hoping the research results might be applicable to RT.
I'm referring to the static priority SCHED_FIFO and SCHED_RR schedulers,
which are (intentionally) dumb as a post, allowing userspace to manage
CPU time explicitly. Proportionally fair scheduling is a cool
capability, but not a design goal of those schedulers.
-- Chris
next prev parent reply other threads:[~2007-07-29 2:41 UTC|newest]
Thread overview: 45+ messages / expand[flat|nested] mbox.gz Atom feed top
2007-07-23 18:38 [RFC] scheduler: improve SMP fairness in CFS Tong Li
2007-07-23 20:00 ` Andi Kleen
2007-07-23 21:10 ` Li, Tong N
2007-07-23 21:25 ` Chris Friesen
2007-07-24 9:43 ` Andi Kleen
2007-07-23 23:40 ` Chris Snook
2007-07-24 8:07 ` Chris Snook
2007-07-24 17:11 ` Li, Tong N
2007-07-24 17:07 ` Tong Li
2007-07-24 18:08 ` Chris Snook
2007-07-24 19:47 ` Chris Friesen
2007-07-24 20:39 ` Chris Snook
2007-07-24 20:58 ` Li, Tong N
2007-07-24 21:09 ` Chris Snook
2007-07-24 21:23 ` Chris Friesen
2007-07-24 21:45 ` Chris Snook
2007-07-24 23:33 ` Chris Friesen
2007-07-24 21:06 ` Bill Huey
2007-07-24 21:22 ` Chris Snook
2007-07-24 23:14 ` Bill Huey
2007-07-24 21:12 ` Chris Friesen
2007-07-25 11:01 ` Ingo Molnar
2007-07-25 12:03 ` Ingo Molnar
2007-07-25 17:23 ` Tong Li
2007-07-25 19:24 ` Ingo Molnar
2007-07-25 20:38 ` Chris Friesen
2007-07-25 20:55 ` Chris Snook
2007-07-25 21:15 ` Li, Tong N
2007-07-25 22:24 ` Chris Snook
2007-07-26 19:00 ` Tong Li
2007-07-26 21:31 ` Ingo Molnar
2007-07-26 22:00 ` Li, Tong N
2007-07-27 1:34 ` Tong Li
2007-07-27 17:16 ` Chris Snook
2007-07-27 19:03 ` Tong Li
2007-07-27 22:20 ` Bill Huey
2007-07-27 23:36 ` Chris Snook
2007-07-28 0:54 ` Bill Huey
2007-07-28 2:59 ` Chris Snook
2007-07-28 19:38 ` Tong Li
2007-07-29 2:40 ` Chris Snook [this message]
2007-07-28 19:23 ` Tong Li
2007-07-29 3:01 ` Chris Snook
2007-07-25 18:20 ` Li, Tong N
2007-07-25 19:18 ` Ingo Molnar
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=46ABFE2D.1060505@redhat.com \
--to=csnook@redhat.com \
--cc=billh@gnuppy.monkey.org \
--cc=linux-kernel@vger.kernel.org \
--cc=mingo@elte.hu \
--cc=tong.n.li@intel.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox