All of lore.kernel.org
 help / color / mirror / Atom feed
From: Micah Dowty <micah@vmware.com>
To: Dmitry Adamushko <dmitry.adamushko@gmail.com>
Cc: Ingo Molnar <mingo@elte.hu>, Christoph Lameter <clameter@sgi.com>,
	Kyle Moffett <mrmacman_g4@mac.com>,
	Cyrus Massoumi <cyrusm@gmx.net>,
	LKML Kernel <linux-kernel@vger.kernel.org>,
	Andrew Morton <akpm@osdl.org>, Mike Galbraith <efault@gmx.de>,
	Paul Menage <menage@google.com>,
	Peter Williams <pwil3058@bigpond.net.au>
Subject: Re: High priority tasks break SMP balancer?
Date: Fri, 16 Nov 2007 14:14:04 -0800	[thread overview]
Message-ID: <20071116221404.GC31527@vmware.com> (raw)
In-Reply-To: <b647ffbd0711160248v444d8f64ic72a17fcb56adc6d@mail.gmail.com>

On Fri, Nov 16, 2007 at 11:48:50AM +0100, Dmitry Adamushko wrote:
> could you try to change either :
> 
> cat /proc/sys/kernel/sched_stat_granularity
> 
> put it to the value equal to a tick on your system

This didn't seem to have any effect.

> or just remove bit #3 (which is responsible for 8 == 1000) here:
> 
> cat /proc/sys/kernel/sched_features
> 
> (this one is enabled by default in 2.6.23.1)

Aha. Turning off bit 3 appears to instantly fix my problem while it's
occurring in an existing process, and I can't reproduce it on any new
processes afterward.

> anyway, when it comes to calculating rq->cpu_load[], a nice(0) cpu-hog
> task (on cpu_0) may generate a similar load (contribute to
> rq->cpu_load[]) as e.g. some negatively reniced task (on cpu_1) which
> runs only periodically (say, once in a tick for N ms., etc.) [*]
> 
> The thing is that the higher a prio of the task, the bigger 'weight'
> it has (prio_to_wait[] table in sched.c) ... and roughly, the load it
> generates is not only 'proportional' to 'run-time per fixed interval
> of time' but also to its 'weight'. That's why the [*] above.

Right. I gathered from reading the scheduler source earlier that the
load average is intended to be proportional to the priority of the
task, but I was really confused by the fairly nondeterministic effect
on the cpu_load average that my test process is having.

> so you may have a situation :
> 
> cpu_0 : e.g. a nice(-20) task running periodically every tick and
> generating, say ~10% cpu load ;

Part of the problem may be that my high-priority task can run much
more often than every tick. In my test case and in the VMware code
that I originally observed the problem in, the thread can wake up
based on /dev/rtc or on a device IRQ. Either of these can happen much
more frequently than the scheduler tick, if I understand correctly.

> cpu_1 : 2-3 nice(0) cpu-hog tasks ;
> 
> both cpus may be seen with similar rq->load_cpu[]...

When I try this, cpu0 has a cpu_load[] of over 10000 and cpu1 has a
load of 2048 or so.

> yeah, one would
> argue that one of the cpu hogs could be migrated to cpu_0 and consume
> remaining 'time slots' and it would not "disturb" the nice(-20) task
> as :
> it's able to preempt the lower prio task whenever it want (provided,
> fine-grained kernel preemption) and we don't care that much of
> trashing of caches here.

Yes, that's the behaviour I expected to see (and what my application
would prefer).

> btw., without the precise load balancing, there can be situations when
> the nice(-20) (or say, a RT periodic task) can be even not seen (i.e.
> don't contribute to cpu_load[]) on cpu_0...
> we do sampling every tick (sched.c :: update_cpu_load()) and consider
> this_rq->ls.load.weight at this particular moment (that is the sum of
> 'weights' for all runnable tasks on this rq)... and it may well be
> that the aforementioned high-priority task is just never (or likely,
> rarely) runnable at this particular moment (it runs for short interval
> of time in between ticks).

Indeed. I think this is the major contributor to the nondeterminism
I'm seeing.

Thanks much,
--Micah

  reply	other threads:[~2007-11-16 22:14 UTC|newest]

Thread overview: 35+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2007-11-09 22:34 High priority tasks break SMP balancer? Micah Dowty
2007-11-09 23:56 ` Cyrus Massoumi
2007-11-10  0:11   ` Micah Dowty
2007-11-14 18:39     ` Micah Dowty
2007-11-15 18:48     ` Kyle Moffett
2007-11-15 19:14       ` Micah Dowty
2007-11-15 20:07         ` Christoph Lameter
2007-11-15 20:24           ` Micah Dowty
2007-11-15 21:28             ` Christoph Lameter
2007-11-15 21:35               ` Micah Dowty
2007-11-16  2:31                 ` Christoph Lameter
2007-11-16  2:44                   ` Micah Dowty
2007-11-16  6:07                     ` Ingo Molnar
2007-11-16  9:19                       ` Micah Dowty
2007-11-16 10:45                         ` Ingo Molnar
2007-11-16 10:48                       ` Micah Dowty
2007-11-16 22:12                         ` Christoph Lameter
2007-11-16 10:48                       ` Dmitry Adamushko
2007-11-16 22:14                         ` Micah Dowty [this message]
2007-11-16 23:26                           ` Dmitry Adamushko
2007-11-17  1:03                             ` Micah Dowty
2007-11-17 19:10                               ` Dmitry Adamushko
2007-11-19 18:51                                 ` Micah Dowty
2007-11-19 22:22                                   ` Dmitry Adamushko
2007-11-19 23:05                                     ` Micah Dowty
2007-11-20  5:57                                       ` Ingo Molnar
2007-11-20 18:06                                         ` Micah Dowty
2007-11-20 21:47                                           ` Dmitry Adamushko
2007-11-22  7:46                                             ` Micah Dowty
2007-11-22 12:53                                               ` Dmitry Adamushko
2007-11-26 19:44                                                 ` Micah Dowty
2007-11-27  9:21                                                   ` Dmitry Adamushko
2007-11-27 17:13                                                     ` Micah Dowty
2007-11-16 19:13         ` David Newall
2007-11-16 21:38           ` Micah Dowty

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20071116221404.GC31527@vmware.com \
    --to=micah@vmware.com \
    --cc=akpm@osdl.org \
    --cc=clameter@sgi.com \
    --cc=cyrusm@gmx.net \
    --cc=dmitry.adamushko@gmail.com \
    --cc=efault@gmx.de \
    --cc=linux-kernel@vger.kernel.org \
    --cc=menage@google.com \
    --cc=mingo@elte.hu \
    --cc=mrmacman_g4@mac.com \
    --cc=pwil3058@bigpond.net.au \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.