public inbox for linux-kernel@vger.kernel.org
 help / color / mirror / Atom feed
From: Chris Snook <csnook@redhat.com>
To: Andrea Arcangeli <andrea@suse.de>
Cc: tim.c.chen@linux.intel.com, mingo@elte.hu, linux-kernel@vger.kernel.org
Subject: Re: pluggable scheduler thread (was Re: Volanomark slows by 80% under CFS)
Date: Sat, 28 Jul 2007 02:51:12 -0400	[thread overview]
Message-ID: <46AAE760.9030602@redhat.com> (raw)
In-Reply-To: <20070728050141.GC31622@v2.random>

Andrea Arcangeli wrote:
> On Fri, Jul 27, 2007 at 11:43:23PM -0400, Chris Snook wrote:
>> I'm pretty sure the point of posting a patch that triples CFS performance 
>> on a certain benchmark and arguably improves the semantics of sched_yield 
>> was to improve CFS.  You have a point, but it is a point for a different 
>> thread.  I have taken the liberty of starting this thread for you.
> 
> I've no real interest in starting or participating in flamewars
> (especially the ones not backed by hard numbers). So I adjusted the
> subject a bit in the hope the discussion will not degenerate as you
> predicted, hope you don't mind.

Not at all.  I clearly misread your tone.

> I'm pretty sure the point of posting that email was to show the
> remaining performance regression with the sched_yield fix applied
> too. Given you considered my post both offtopic and inflammatory, I
> guess you think it's possible and reasonably easy to fix that
> remaining regression without a pluggable scheduler, right? So please
> enlighten us on your intend to achieve it.

There are four possibilities that are immediately obvious to me:

a) The remaining difference is due mostly to the algorithmic complexity 
of the rbtree algorithm in CFS.

If this is the case, we should be able to vary the test parameters (CPU 
count, thread count, etc.) graph the results, and see a roughly 
logarithmic divergence between the schedulers as some parameter(s) vary. 
  If this is the problem, we may be able to fix it with data structure 
tweaks or optimized base cases, like how quicksort can be optimized by 
using insertion sort below a certain threshold.

b) The remaining difference is due mostly to how the scheduler handles 
volanomark.

vmstat can give us a comparison of context switches between O(1), CFS, 
and CFS+patch.  If the decrease in throughput correlates with an 
increase in context switches, we may be able to induce more O(1)-like 
behavior by charging tasks for context switch overhead.

c) The remaining difference is due mostly to how the scheduler handles 
something other than volanomark.

If context switch count is not the problem, context switch pattern still 
could be.  I doubt we'd see a 40% difference due to cache misses, but 
it's possible.  Fortunately, oprofile can sample based on cache misses, 
so we can debug this too.

d) The remaining difference is due mostly to some implementation detail 
in CFS.

It's possible there's some constant-factor overhead in CFS that is 
magnified heavily by the context switching volanomark deliberately 
induces.  If this is the case, oprofile sampling on clock cycles should 
catch it.

Tim --

	Since you're already set up to do this benchmarking, would you mind 
varying the parameters a bit and collecting vmstat data?  If you want to 
run oprofile too, that wouldn't hurt.

> Also consider the other numbers likely used nptl so they shouldn't be
> affected by sched_yield changes.
> 
>> Sure there is.  We can run a fully-functional POSIX OS without using any 
>> block devices at all.  We cannot run a fully-functional POSIX OS without a 
>> scheduler. Any feature without which the OS cannot execute userspace code 
>>  is sufficiently primitive that somewhere there is a device on which it will 
>> be impossible to debug if that feature fails to initialize.  It is quite 
>> reasonable to insist on only having one implementation of such features in 
>> any given kernel build.
> 
> Sounds like a red-herring to me... There aren't just pluggable I/O
> schedulers in the kernel, there are pluggable packet schedulers too
> (see `tc qdisc`). And both are switchable at runtime (not just at boot
> time).
> 
> Can you run your fully-functional POSIX OS without a packet scheduler
> and without an I/O scheduler? I wonder where are you going to
> read/write data without HD and network?

If I'm missing both, I'm pretty screwed, but if either one is 
functional, I can send something out.

> Also those pluggable things don't increase the risk of crash much, if
> compared to the complexity of the schedulers.
> 
>> Whether or not these alternatives belong in the source tree as config-time 
>> options is a political question, but preserving boot-time debugging 
>> capability is a perfectly reasonable technical motivation.
> 
> The scheduler is invoked very late in the boot process (printk and
> serial console, kdb are working for ages when scheduler kicks in), so
> it's fully debuggable (no debugger depends on the scheduler, they run
> inside the nmi handler...), I don't really see your point.

I'm more concerned about embedded systems.  These are the same people 
who want userspace character drivers to control their custom hardware. 
Having the robot point to where it hurts is a lot more convenient than 
hooking up a JTAG debugger.

> And even if there would be a subtle bug in the scheduler you'll never
> trigger it at boot with so few tasks and so few context switches.

Sure, but it's the non-subtle bugs that worry me.  These are usually 
related to low-level hardware setup, so they could miss the mainstream 
developers and clobber unsuspecting embedded developers.

I acknowledge that debugging such problems shouldn't be terribly hard on 
mainstream systems, but some people are going to want to choose a single 
scheduler at build time and avoid the hassle.  If we can improve CFS to 
be regression-free, and I think we can if we give ourselves a few 
percent tolerance and keep tracking down the corner cases, the pluggable 
scheduler infrastructure will just be another disused feature.

	-- Chris

  reply	other threads:[~2007-07-28  6:51 UTC|newest]

Thread overview: 18+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2007-07-27 22:01 Volanomark slows by 80% under CFS Tim Chen
2007-07-28  0:31 ` Chris Snook
2007-07-28  0:59   ` Andrea Arcangeli
2007-07-28  3:43     ` pluggable scheduler flamewar thread (was Re: Volanomark slows by 80% under CFS) Chris Snook
2007-07-28  5:01       ` pluggable scheduler " Andrea Arcangeli
2007-07-28  6:51         ` Chris Snook [this message]
2007-07-30 18:49           ` Tim Chen
2007-07-30 21:07             ` Chris Snook
2007-07-30 21:24               ` Andrea Arcangeli
2007-07-28 13:28   ` Volanomark slows by 80% under CFS Dmitry Adamushko
2007-07-28  2:47 ` Rik van Riel
2007-07-28 20:26   ` Dave Jones
2007-07-28 12:36 ` Dmitry Adamushko
2007-07-28 18:55   ` David Schwartz
2007-07-29 17:37 ` [patch] sched: yield debugging Ingo Molnar
2007-07-30 18:10   ` Tim Chen
2007-07-31 20:33     ` Ingo Molnar
2007-08-01 20:53       ` Tim Chen

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=46AAE760.9030602@redhat.com \
    --to=csnook@redhat.com \
    --cc=andrea@suse.de \
    --cc=linux-kernel@vger.kernel.org \
    --cc=mingo@elte.hu \
    --cc=tim.c.chen@linux.intel.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox