public inbox for linux-kernel@vger.kernel.org
 help / color / mirror / Atom feed
From: Gregory Haskins <ghaskins@novell.com>
To: Ingo Molnar <mingo@elte.hu>
Cc: Peter Zijlstra <peterz@infradead.org>,
	Nick Piggin <nickpiggin@yahoo.com.au>, vatsa <vatsa@in.ibm.com>,
	linux-kernel <linux-kernel@vger.kernel.org>,
	"D. Bahi" <dbahi@novell.com>
Subject: Re: [PATCH] sched: properly account IRQ and RT load in SCHED_OTHER load balancing
Date: Thu, 21 Aug 2008 08:26:08 -0400	[thread overview]
Message-ID: <48AD5EE0.8070407@novell.com> (raw)
In-Reply-To: <20080821114126.GB30667@elte.hu>

[-- Attachment #1: Type: text/plain, Size: 4241 bytes --]

Ingo Molnar wrote:
> * Gregory Haskins <ghaskins@novell.com> wrote:
>
>   
>> I haven't had a chance to review the code thoroughly yet, but I had 
>> been working on a similar fix and know that this is sorely needed.  
>> So...
>>     
>
> btw., why exactly does this patch speed up certain workloads? I'm not 
> quite sure about the exact reasons of that.
>
> 	Ingo
>   

I used to have a great demo for the prototype I was working on, but id 
have to dig it up.  The gist of it is that the pre-patched scheduler 
basically gets thrown for a completely loop in the presence of a mixed 
CFS/RT environment.  This isn't a PREEMPT_RT specific problem per se, 
though PREEMPT_RT does bring the problem to the forefront since it has 
so many active RT tasks by default (for the IRQs, etc) which make it 
more evident.

Since an RT tasks previous usage of declaring "load" did not actually 
express the true nature of the RQ load, CFS tasks would have a few 
really nasty things happen to them while trying to run on the system 
simultaneously.  One of them was that you could starve out CFS tasks 
from certain cores (even though there was plenty of CPU bandwidth 
available elsewhere) and the load-balancer would think everything is 
fine and thus fail to make adjustments.

Say you have a 4 core system.  You could, for instance, get into a 
situation where the softirq-net-rx thread was consuming 80% of core 0, 
yet the load balancer would still spread, say, a 40 thread CFS load 
evenly across all cores (approximately 10 per core, though you would 
account for the "load" that the softirq thread contributed too).  The 
threads on the other cores would of course enjoy 100% bandwidth, while 
the ~10 threads on core 0 would only see 1/5th of that bandwidth.

What it comes down to is that the CFS load should have been evenly 
distributed across the available bandwidth of 3*100% + 1*20%, not 4*100% 
as it does today.  The net result is that the application performs in a 
very lopsided manner, with some threads getting significantly less (or 
sometimes zero!) cpu time compared to their peers.  You can make this 
more obvious by nice'ing the CFS load up as high as it will go, which 
will approximate 1/2 of the load of the softirq (since RT tasks 
previously enjoyed a 2*MAX_SCHED_OTHER_LOAD rating.

I have observed this phenomenon (and its fix) while looking at things 
like network intensive workloads.  I'm sure there are plenty of others 
that could cause similar ripples.

The fact is, the scheduler treats "load" to mean certain things which 
simply did not apply to RT tasks.  As you know very well im sure ;), 
"load" is a metric which expresses the share of the cpu that will be 
consumed and this is used by the load balancer to make its decisions.  
However, you can put whatever rating you want on an RT task and it would 
always be irrelevant.  RT tasks run as frequently and as long as they 
want (w.r.t. SCHED_OTHER) independent of what their load rating implies 
to the balancer, so you cannot make an accurate assessment of the true 
"available shares".  This is why the load-balancer would become confused 
and fail to see true imbalance in a mixed environment.  Fixing this, as 
Peter has attempted to do, will result in a much better distribution of 
SCHED_OTHER tasks across the true available bandwidth, and thus improve 
overall performance.

In previous discussions with people, I had always used a metaphor of a 
stream.  A system running SCHED_OTHER tasks is like a smooth running 
stream, but  dispatching an RT task (or an IRQ, even) is like throwing a 
boulder into the water.  It makes a big disruptive splash and causes 
turbulent white water behind it.  And the stream has no influence over 
the size of the boulder, its placement in the stream, nor how long it 
will be staying.

This fix (at least in concept) allows it to become more like gently 
slipping a streamlined aerodynamic object into the water.  The stream 
still cannot do anything about the size or placement of the object, but 
it can at least flow around it and smoothly adapt to the reduced volume 
of water that the stream can carry. :)

HTH
-Greg


[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 257 bytes --]

  reply	other threads:[~2008-08-21 12:28 UTC|newest]

Thread overview: 11+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2008-08-21  9:18 [PATCH] sched: properly account IRQ and RT load in SCHED_OTHER load balancing Peter Zijlstra
2008-08-21 10:47 ` Ingo Molnar
2008-08-21 11:17   ` Ingo Molnar
2008-08-21 11:22     ` Peter Zijlstra
2008-08-21 11:40       ` Ingo Molnar
2008-08-21 11:36 ` Gregory Haskins
2008-08-21 11:41   ` Ingo Molnar
2008-08-21 12:26     ` Gregory Haskins [this message]
2008-08-21 12:43 ` Peter Zijlstra
2008-08-21 12:47   ` Gregory Haskins
2008-08-21 12:56     ` Peter Zijlstra

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=48AD5EE0.8070407@novell.com \
    --to=ghaskins@novell.com \
    --cc=dbahi@novell.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=mingo@elte.hu \
    --cc=nickpiggin@yahoo.com.au \
    --cc=peterz@infradead.org \
    --cc=vatsa@in.ibm.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox