public inbox for linux-kernel@vger.kernel.org
 help / color / mirror / Atom feed
From: Chris Snook <csnook@redhat.com>
To: Tong Li <tong.n.li@intel.com>
Cc: "Bill Huey (hui)" <billh@gnuppy.monkey.org>,
	Ingo Molnar <mingo@elte.hu>,
	linux-kernel@vger.kernel.org
Subject: Re: [RFC] scheduler: improve SMP fairness in CFS
Date: Sat, 28 Jul 2007 22:40:45 -0400	[thread overview]
Message-ID: <46ABFE2D.1060505@redhat.com> (raw)
In-Reply-To: <Pine.LNX.4.64.0707281224100.22772@tongli.jf.intel.com>

Tong Li wrote:
> On Fri, 27 Jul 2007, Chris Snook wrote:
> 
>> Bill Huey (hui) wrote:
>>> You have to consider the target for this kind of code. There are 
>>> applications
>>> where you need something that falls within a constant error bound. 
>>> According
>>> to the numbers, the current CFS rebalancing logic doesn't achieve 
>>> that to
>>> any degree of rigor. So CFS is ok for SCHED_OTHER, but not for 
>>> anything more
>>> strict than that.
>>
>> I've said from the beginning that I think that anyone who desperately 
>> needs perfect fairness should be explicitly enforcing it with the aid 
>> of realtime priorities.  The problem is that configuring and tuning a 
>> realtime application is a pain, and people want to be able to 
>> approximate this behavior without doing a whole lot of dirty work 
>> themselves.  I believe that CFS can and should be enhanced to ensure 
>> SMP-fairness over potentially short, user-configurable intervals, even 
>> for SCHED_OTHER.  I do not, however, believe that we should take it to 
>> the extreme of wasting CPU cycles on migrations that will not improve 
>> performance for *any* task, just to avoid letting some tasks get ahead 
>> of others.  We should be as fair as possible but no fairer.  If we've 
>> already made it as fair as possible, we should account for the margin 
>> of error and correct for it the next time we rebalance.  We should not 
>> burn the surplus just to get rid of it.
> 
> Proportional-share scheduling actually has one of its roots in real-time 
> and having a p-fair scheduler is essential for real-time apps (soft 
> real-time).

Sounds like another scheduler class might be in order.  I find CFS to be 
fair enough for most purposes.  If the code that gives us near-perfect 
fairness at the expense of efficiency only runs when tasks have been 
given boosted priority by a privileged user, and only on the CPUs that 
have such tasks queued on them, the run time overhead and code 
complexity become much smaller concerns.

>>
>> On a non-NUMA box with single-socket, non-SMT processors, a constant 
>> error bound is fine.  Once we add SMT, go multi-core, go NUMA, and add 
>> inter-chassis interconnects on top of that, we need to multiply this 
>> error bound at each stage in the hierarchy, or else we'll end up 
>> wasting CPU cycles on migrations that actually hurt the processes 
>> they're supposed to be helping, and hurt everyone else even more.  I 
>> believe we should enforce an error bound that is proportional to 
>> migration cost.
>>
> 
> I think we are actually in agreement. When I say constant bound, it can 
> certainly be a constant that's determined based on inputs from the 
> memory hierarchy. The point is that it needs to be a constant 
> independent of things like # of tasks.

Agreed.

>> But this patch is only relevant to SCHED_OTHER.  The realtime 
>> scheduler doesn't have a concept of fairness, just priorities.  That 
>> why each realtime priority level has its own separate runqueue.  
>> Realtime schedulers are supposed to be dumb as a post, so they cannot 
>> heuristically decide to do anything other than precisely what you 
>> configured them to do, and so they don't get in the way when you're 
>> context switching a million times a second.
> 
> Are you referring to hard real-time? As I said, an infrastructure that 
> enables p-fair scheduling, EDF, or things alike is the foundation for 
> real-time. I designed DWRR, however, with a target of non-RT apps, 
> although I was hoping the research results might be applicable to RT.

I'm referring to the static priority SCHED_FIFO and SCHED_RR schedulers, 
which are (intentionally) dumb as a post, allowing userspace to manage 
CPU time explicitly.  Proportionally fair scheduling is a cool 
capability, but not a design goal of those schedulers.

	-- Chris

  reply	other threads:[~2007-07-29  2:41 UTC|newest]

Thread overview: 45+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2007-07-23 18:38 [RFC] scheduler: improve SMP fairness in CFS Tong Li
2007-07-23 20:00 ` Andi Kleen
2007-07-23 21:10   ` Li, Tong N
2007-07-23 21:25     ` Chris Friesen
2007-07-24  9:43       ` Andi Kleen
2007-07-23 23:40 ` Chris Snook
2007-07-24  8:07   ` Chris Snook
2007-07-24 17:11     ` Li, Tong N
2007-07-24 17:07   ` Tong Li
2007-07-24 18:08     ` Chris Snook
2007-07-24 19:47       ` Chris Friesen
2007-07-24 20:39         ` Chris Snook
2007-07-24 20:58           ` Li, Tong N
2007-07-24 21:09             ` Chris Snook
2007-07-24 21:23               ` Chris Friesen
2007-07-24 21:45                 ` Chris Snook
2007-07-24 23:33                   ` Chris Friesen
2007-07-24 21:06           ` Bill Huey
2007-07-24 21:22             ` Chris Snook
2007-07-24 23:14               ` Bill Huey
2007-07-24 21:12           ` Chris Friesen
2007-07-25 11:01 ` Ingo Molnar
2007-07-25 12:03   ` Ingo Molnar
2007-07-25 17:23     ` Tong Li
2007-07-25 19:24       ` Ingo Molnar
2007-07-25 20:38         ` Chris Friesen
2007-07-25 20:55           ` Chris Snook
2007-07-25 21:15             ` Li, Tong N
2007-07-25 22:24               ` Chris Snook
2007-07-26 19:00         ` Tong Li
2007-07-26 21:31           ` Ingo Molnar
2007-07-26 22:00             ` Li, Tong N
2007-07-27  1:34               ` Tong Li
2007-07-27 17:16                 ` Chris Snook
2007-07-27 19:03                   ` Tong Li
2007-07-27 22:20                     ` Bill Huey
2007-07-27 23:36                     ` Chris Snook
2007-07-28  0:54                       ` Bill Huey
2007-07-28  2:59                         ` Chris Snook
2007-07-28 19:38                           ` Tong Li
2007-07-29  2:40                             ` Chris Snook [this message]
2007-07-28 19:23                       ` Tong Li
2007-07-29  3:01                         ` Chris Snook
2007-07-25 18:20     ` Li, Tong N
2007-07-25 19:18       ` Ingo Molnar

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=46ABFE2D.1060505@redhat.com \
    --to=csnook@redhat.com \
    --cc=billh@gnuppy.monkey.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=mingo@elte.hu \
    --cc=tong.n.li@intel.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox